Marlon Gaviria | Samuel Agudelo | Federico Milotta

Nicolas Sarmiento | Carolina Vergara

Sesion 4.7.2

Punto 10

This question should be answered using the Weekly data set, which is part of the ISLR package. This data is similar in nature to the Smarket data from this chapter’s lab, except that it contains 1, 089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010.

A

Produce some numerical and graphical summaries of the Weekly data. Do there appear to be any patterns?

dim(Weekly) # 9 variables con 1089 observaciones
## [1] 1089    9
summary(Weekly)
##       Year           Lag1               Lag2               Lag3         
##  Min.   :1990   Min.   :-18.1950   Min.   :-18.1950   Min.   :-18.1950  
##  1st Qu.:1995   1st Qu.: -1.1540   1st Qu.: -1.1540   1st Qu.: -1.1580  
##  Median :2000   Median :  0.2410   Median :  0.2410   Median :  0.2410  
##  Mean   :2000   Mean   :  0.1506   Mean   :  0.1511   Mean   :  0.1472  
##  3rd Qu.:2005   3rd Qu.:  1.4050   3rd Qu.:  1.4090   3rd Qu.:  1.4090  
##  Max.   :2010   Max.   : 12.0260   Max.   : 12.0260   Max.   : 12.0260  
##       Lag4               Lag5              Volume            Today         
##  Min.   :-18.1950   Min.   :-18.1950   Min.   :0.08747   Min.   :-18.1950  
##  1st Qu.: -1.1580   1st Qu.: -1.1660   1st Qu.:0.33202   1st Qu.: -1.1540  
##  Median :  0.2380   Median :  0.2340   Median :1.00268   Median :  0.2410  
##  Mean   :  0.1458   Mean   :  0.1399   Mean   :1.57462   Mean   :  0.1499  
##  3rd Qu.:  1.4090   3rd Qu.:  1.4050   3rd Qu.:2.05373   3rd Qu.:  1.4050  
##  Max.   : 12.0260   Max.   : 12.0260   Max.   :9.32821   Max.   : 12.0260  
##  Direction 
##  Down:484  
##  Up  :605  
##            
##            
##            
## 
cor(Weekly[, -9]) # Existe alta correlacion entre el año y el volumen
##               Year         Lag1        Lag2        Lag3         Lag4
## Year    1.00000000 -0.032289274 -0.03339001 -0.03000649 -0.031127923
## Lag1   -0.03228927  1.000000000 -0.07485305  0.05863568 -0.071273876
## Lag2   -0.03339001 -0.074853051  1.00000000 -0.07572091  0.058381535
## Lag3   -0.03000649  0.058635682 -0.07572091  1.00000000 -0.075395865
## Lag4   -0.03112792 -0.071273876  0.05838153 -0.07539587  1.000000000
## Lag5   -0.03051910 -0.008183096 -0.07249948  0.06065717 -0.075675027
## Volume  0.84194162 -0.064951313 -0.08551314 -0.06928771 -0.061074617
## Today  -0.03245989 -0.075031842  0.05916672 -0.07124364 -0.007825873
##                Lag5      Volume        Today
## Year   -0.030519101  0.84194162 -0.032459894
## Lag1   -0.008183096 -0.06495131 -0.075031842
## Lag2   -0.072499482 -0.08551314  0.059166717
## Lag3    0.060657175 -0.06928771 -0.071243639
## Lag4   -0.075675027 -0.06107462 -0.007825873
## Lag5    1.000000000 -0.05851741  0.011012698
## Volume -0.058517414  1.00000000 -0.033077783
## Today   0.011012698 -0.03307778  1.000000000

Tenemos una base de datos que contiene 9 variables con 1089 observaciones. Al observar la matriz de correlacion notamos una alta dependencia positiva entre el año y el voumen, entre el resto de las variables no hay correlaciones significativas.

plot(Volume ~ Year, data = Weekly)

set.seed(2020)
plot(Volume ~ jitter(Year, factor = 2.5), data = Weekly, 
     xlab = "Year", ylab = "Volume (billions)")

Es evidente la dependencia lineal positiva entre la variable volumen y la variable año, sin embargo a partir del añ0 2005 esta dependencia no es tan notoria ya que vemos una mayor dispersion de los datos.

B

Use the full data set to perform a logistic regression with Direction as the response and the five lag variables plus Volume as predictors. Use the summary function to print the results. Do any of the predictors appear to be statistically significant? If so, which ones?

logit_mod1 <- glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume,
                 data = Weekly, family = binomial(link = "logit"))

summary(logit_mod1) 
## 
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
##     Volume, family = binomial(link = "logit"), data = Weekly)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6949  -1.2565   0.9913   1.0849   1.4579  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.26686    0.08593   3.106   0.0019 **
## Lag1        -0.04127    0.02641  -1.563   0.1181   
## Lag2         0.05844    0.02686   2.175   0.0296 * 
## Lag3        -0.01606    0.02666  -0.602   0.5469   
## Lag4        -0.02779    0.02646  -1.050   0.2937   
## Lag5        -0.01447    0.02638  -0.549   0.5833   
## Volume      -0.02274    0.03690  -0.616   0.5377   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1496.2  on 1088  degrees of freedom
## Residual deviance: 1486.4  on 1082  degrees of freedom
## AIC: 1500.4
## 
## Number of Fisher Scoring iterations: 4

Con un nivel de significancia del 0.05 el unico predictor estadisticamente significativo es Lag2.

C

Compute the confusion matrix and overall fraction of correct predictions. Explain what the confusion matrix is telling you about the types of mistakes made by logistic regression.

# Resultados en probabilidades
predlogit_mod1 <- predict(logit_mod1, type = "response")

contrasts(Weekly$Direction)
##      Up
## Down  0
## Up    1
predlogit_mod1 <- ifelse(predlogit_mod1 > 0.5, yes = "Up", no = "Down")

# Calculo de la matrix de confusion y porcentaje de predicciones correctas
confusion <- function(prediction, data, cutoff=0.5){
  real <- as.character(data$Direction)
  a <- sum(real=="Down" & prediction=="Down") # correct positive prediction
  b <- sum(real=="Up" & prediction=="Down") # incorrect positive prediction
  c <- sum(real=="Down" & prediction=="Up") # correct negative prediction
  d <- sum(real=="Up" & prediction=="Up") # incorrect negative prediction
  CM <- data.frame("1" = c("Prediction","Down","Up"), 
                   "2" = c("Down", a, c),
                   "3" = c("Up", b, d))
  names(CM) <- c("", "Real", "")
  ACC <- (a + d) / (a + b + c + d) # = 1 - err
  ERR <- (b + c) / (a + b + c + d) # = 1 - acc
  
return(list(confusion_matrix = CM, 
            accuracy = ACC,
            error = ERR))
}

confusion(predlogit_mod1, Weekly)
## $confusion_matrix
##              Real    
## 1 Prediction Down  Up
## 2       Down   54  48
## 3         Up  430 557
## 
## $accuracy
## [1] 0.5610652
## 
## $error
## [1] 0.4389348

-la cantidad de positivos que fueron clasificados correctamente como positivos por el modelo son 54

-la cantidad de negativos que fueron clasificados correctamente como negativos por el modelo son 48.

-la cantidad de positivos que fueron clasificados incorrectamente como negativos son 430.

-la cantidad de negativos que fueron clasificados incorrectamente como positivos son 557.

Tenemos una exactitud equivalente a 0.56, dicha exactitud se refiere a la dispersion del conjunto de valores obtenidos a partir de mediciones repetidas. Es asi como el el modelo predijo correctamente en un 56%. El porcentaje de la base de datos clasifica incorrectamente equivale a 0.44.

D

Now fit the logistic regression model using a training data period from 1990 to 2008, with Lag2 as the only predictor. Compute the confusion matrix and the overall fraction of correct predictions for the held out data (that is, the data from 2009 and 2010).

test <- c(which(Weekly$Year == 2009), which(Weekly$Year == 2010))
train <- Weekly[-test, ]
test <- Weekly[test, ]

# Solo Lag2 con funcion de activacion: logistica
logit_mod <- glm(Direction ~ Lag2,
                 data = train, family = binomial(link = "logit"))
# Resumen del modelo
summary(logit_mod)
## 
## Call:
## glm(formula = Direction ~ Lag2, family = binomial(link = "logit"), 
##     data = train)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.536  -1.264   1.021   1.091   1.368  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.20326    0.06428   3.162  0.00157 **
## Lag2         0.05810    0.02870   2.024  0.04298 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1354.7  on 984  degrees of freedom
## Residual deviance: 1350.5  on 983  degrees of freedom
## AIC: 1354.5
## 
## Number of Fisher Scoring iterations: 4
# Resultados en probabilidades
predlogit_mod <- predict(logit_mod, newdata = test, type = "response")

predlogit_mod <- ifelse(predlogit_mod > 0.5, yes = "Up", no = "Down")

confusion(predlogit_mod, test) # En el 62.5% de los casos la regresion predijo bien
## $confusion_matrix
##              Real   
## 1 Prediction Down Up
## 2       Down    9  5
## 3         Up   34 56
## 
## $accuracy
## [1] 0.625
## 
## $error
## [1] 0.375

-la cantidad de positivos que fueron clasificados correctamente como positivos por el modelo son 9

-la cantidad de negativos que fueron clasificados correctamente como negativos por el modelo son 5.

-la cantidad de positivos que fueron clasificados incorrectamente como negativos son 34.

-la cantidad de negativos que fueron clasificados incorrectamente como positivos son 56.

La regresion entrega muchos falsos negativos, es decir, en muchas ocasiones clasifica como Up a una observacion que debe ser Down

Tenemos una exactitud equivalente a 0.625, dicha exactitud se refiere a la dispersion del conjunto de valores obtenidos a partir de mediciones repetidas. Es asi como el el modelo predijo correctamente en un 62.5%. El porcentaje de la base de datos clasifica incorrectamente equivale a 0.375.

E

Repeat (d) using LDA

library (MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
lda_mod <- lda(Direction ~ Lag2,
               data = train)

predlda_mod <- as.character(predict(lda_mod, test)$class)

confusion(predlda_mod, test) # En el 62.5% de los casos la regresión predijo bien
## $confusion_matrix
##              Real   
## 1 Prediction Down Up
## 2       Down    9  5
## 3         Up   34 56
## 
## $accuracy
## [1] 0.625
## 
## $error
## [1] 0.375

Se obtienen los mismos resultados que en el item anterior.

F

Repeat (d) using QDA.

qda_mod <- qda(Direction ~ Lag2, 
               data = train)

predqda_mod <- as.character(predict(qda_mod, test)$class)

confusion(predqda_mod, test) # En el 58.7% de los casos la regresion predijo bien
## $confusion_matrix
##              Real   
## 1 Prediction Down Up
## 2       Down    0  0
## 3         Up   43 61
## 
## $accuracy
## [1] 0.5865385
## 
## $error
## [1] 0.4134615

-la cantidad de positivos que fueron clasificados correctamente como positivos por el modelo son 0

-la cantidad de negativos que fueron clasificados correctamente como negativos por el modelo son 0

-la cantidad de positivos que fueron clasificados incorrectamente como negativos son 43.

-la cantidad de negativos que fueron clasificados incorrectamente como positivos son 61.

Tenemos una exactitud equivalente a 0.5865 y eñ porcentaje de la base de datos clasifica incorrectamente equivale a 0.413.

G

Repeat (d) using KNN with K =1.

library(class)
predknn_mod <- as.character(knn(train[, 3, drop = FALSE], test[, 3, drop = FALSE], cl = train$Direction, k = 1))

confusion(predknn_mod, test) # En el 50.0% de los casos la regresion predijo bien
## $confusion_matrix
##              Real   
## 1 Prediction Down Up
## 2       Down   21 29
## 3         Up   22 32
## 
## $accuracy
## [1] 0.5096154
## 
## $error
## [1] 0.4903846

-la cantidad de positivos que fueron clasificados correctamente como positivos por el modelo son 21-

-la cantidad de negativos que fueron clasificados correctamente como negativos por el modelo son 29.

-la cantidad de positivos que fueron clasificados incorrectamente como negativos son 22.

-la cantidad de negativos que fueron clasificados incorrectamente como positivos son 32.

Tenemos una exactitud equivalente a 0.5096 y eñ porcentaje de la base de datos clasifica incorrectamente equivale a 0.4904.

H

Which of these methods appears to provide the best results on this data?

Los metodos que parecen proporcionar mejores resultados son Logit y LDA.

I

Experiment with different combinations of predictors, includ- ing possible transformations and interactions, for each of the methods. Report the variables, method, and associated confusion matrix that appears to provide the best results on the held out data. Note that you should also experiment with values for K in the KNN classifier.

# Probando Logit
logit_mod2 <- glm(Direction ~ 1,
                 data = train, family = binomial(link = "logit"))
# Resumen del modelo
summary(logit_mod2)
## 
## Call:
## glm(formula = Direction ~ 1, family = binomial(link = "logit"), 
##     data = train)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.268  -1.268   1.090   1.090   1.090  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.20990    0.06408   3.276  0.00105 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1354.7  on 984  degrees of freedom
## Residual deviance: 1354.7  on 984  degrees of freedom
## AIC: 1356.7
## 
## Number of Fisher Scoring iterations: 3
# Resultados en probabilidades
predlogit_mod2 <- predict(logit_mod2, newdata = test, type = "response")

predlogit_mod2 <- ifelse(predlogit_mod2 > 0.5, yes = "Up", no = "Down")

confusion(predlogit_mod2, test) # En el 58.7% de los casos la regresión predijo bien
## $confusion_matrix
##              Real   
## 1 Prediction Down Up
## 2       Down    0  0
## 3         Up   43 61
## 
## $accuracy
## [1] 0.5865385
## 
## $error
## [1] 0.4134615

Realizar un modelo Logit con solo el intercepto produce los mismo resultados que un QDA con Lag2 como unico regresor.

Probando LDA y QDA Al añadir mas variables en LDA y QDA se consigue sobreajustar el modelo y por lo mismo se disminuye la precision de las estimaciones.

# Probando KNN
predknn_mod <- as.character(knn(train[, 3, drop = FALSE], test[, 3, drop = FALSE], cl = train$Direction, k = 3))

confusion(predknn_mod, test) # En el 54.8% de los casos la regresion predijo bien
## $confusion_matrix
##              Real   
## 1 Prediction Down Up
## 2       Down   16 20
## 3         Up   27 41
## 
## $accuracy
## [1] 0.5480769
## 
## $error
## [1] 0.4519231

Con tres grupos (k=3) se deja de aumentar la precision de las predicciones, entonces dicho k es el mas adecuado para obtener predicciones mas correctas.

Punto 11

In this problem, you will develop a model to predict whether a given car gets high or low gas mileage based on the Auto data set.

A

Create a binary variable, mpg01, that contains a 1 if mpg contains a value above its median, and a 0 if mpg contains a value below its median. You can compute the median using the median() function. Note you may find it helpful to use the data.frame() function to create a single data set containing both mpg01 and the other Auto variables.

library(ISLR)
str(Auto)
## 'data.frame':    392 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : num  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...

poseemos una base de datos de 9 variables y 392 observaciones, sus variables son numericas a excepcion de la variable ‘name’

mediana <- median(Auto$mpg)
mpg01 <- ifelse(Auto$mpg > mediana, yes = 1, no = 0)

newAuto <- cbind(mpg01, Auto[, -1])

summary(newAuto)
##      mpg01       cylinders      displacement     horsepower        weight    
##  Min.   :0.0   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
##  1st Qu.:0.0   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
##  Median :0.5   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
##  Mean   :0.5   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
##  3rd Qu.:1.0   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
##  Max.   :1.0   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
##                                                                              
##   acceleration        year           origin                      name    
##  Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
##  1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
##  Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
##  Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
##                                                  (Other)           :365

En el resumen anterior logramos visualizar los valores minimos y maximos de cada una de las variables, ademas tambien podemos visualizar su media y mediana.

library(PerformanceAnalytics) 
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
chart.Correlation(newAuto[, -c(9)], histogram = TRUE, method = "pearson")

B

Explore the data graphically in order to investigate the association between mpg01 and the other features. Which of the other features seem most likely to be useful in predicting mpg01?Scatterplots and boxplots may be useful tools to answer this question. Describe your findings.

# Para no violar la normalidad se usan las covariables sin transformar a categoricas
newAuto$mpg01 <- as.factor(as.character(newAuto$mpg01))

plot(cylinders ~ mpg01, data = newAuto)

plot(year ~ mpg01, data = newAuto)

Los boxplot los usamos para ver le comporatemiento de la variable respuesta, mirando el diagrama de cajas se puede observar que las medias parecen ser significativamente diferentes, es así que sería interesante incluir estas medidas.

C

Split the data into a training set and a test set.

conjunto de entrenamiento y de verificacion

set.seed(2020)
test <- sample(1:nrow(newAuto), size = nrow(newAuto)*0.7, replace = FALSE)
# Conjunto de entrenamiento
train <- newAuto[test, ]
# Conjunto de verificacion
test <- newAuto[-test, ]

Cirlculo de la matriz de confusion y porcentaje de predicciones correctas

confusion <- function(prediction, data){
  real <- data$mpg01
  a <- sum(real=="0" & prediction=="0") # correct positive prediction
  b <- sum(real=="1" & prediction=="0") # incorrect positive prediction
  c <- sum(real=="0" & prediction=="1") # correct negative prediction
  d <- sum(real=="1" & prediction=="1") # incorrect negative prediction
  CM <- data.frame("1" = c("Prediction","0","1"), 
                   "2" = c("0", a, c),
                   "3" = c("1", b, d))
  names(CM) <- c("", "Real", "")
  ACC <- (a + d) / (a + b + c + d) # = 1 - err
  ERR <- (b + c) / (a + b + c + d) # = 1 - acc
  
  return(list(confusion_matrix = CM, 
              accuracy = ACC,
              error = ERR))
}

D

Perform LDA on the training data in order to predict mpg01 using the variables that seemed most associated with mpg01 in (b). What is the test error of the model obtained?

library (MASS)
lda_mod <- lda(mpg01 ~ cylinders + displacement + horsepower + weight + year,
               data = train)

predlda_mod <- as.character(predict(lda_mod, test)$class)

confusion(predlda_mod, test) 
## $confusion_matrix
##              Real   
## 1 Prediction    0  1
## 2          0   48  3
## 3          1    8 59
## 
## $accuracy
## [1] 0.9067797
## 
## $error
## [1] 0.09322034

El error de prueba es del 9.3%

E

Perform QDA on the training data in order to predict mpg01 using the variables that seemed most associated with mpg01 in (b). What is the test error of the model obtained?

qda_mod <- qda(mpg01 ~ cylinders + displacement + horsepower + weight + year,
               data = train)

predqda_mod <- as.character(predict(qda_mod, test)$class)

confusion(predqda_mod, test) 
## $confusion_matrix
##              Real   
## 1 Prediction    0  1
## 2          0   49 12
## 3          1    7 50
## 
## $accuracy
## [1] 0.8389831
## 
## $error
## [1] 0.1610169

El error de prueba es del 16.1%

F

Perform logistic regression on the training data in order to predict mpg01 using the variables that seemed most associated with mpg01 in (b). What is the test error of the model obtained?

logit_mod <- glm(mpg01 ~ cylinders + displacement + horsepower + weight + year,
                 data = train, family = binomial(link = "logit"))
# Resumen del modelo
summary(logit_mod)
## 
## Call:
## glm(formula = mpg01 ~ cylinders + displacement + horsepower + 
##     weight + year, family = binomial(link = "logit"), data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3105  -0.1464  -0.0013   0.2369   3.2283  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -12.513987   6.350907  -1.970  0.04879 *  
## cylinders     -0.592639   0.558038  -1.062  0.28823    
## displacement   0.006059   0.012949   0.468  0.63985    
## horsepower    -0.022678   0.019656  -1.154  0.24861    
## weight        -0.004536   0.001191  -3.807  0.00014 ***
## year           0.385479   0.092263   4.178 2.94e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 379.71  on 273  degrees of freedom
## Residual deviance: 104.34  on 268  degrees of freedom
## AIC: 116.34
## 
## Number of Fisher Scoring iterations: 7
# Resultados en probabilidades
predlogit_mod <- predict(logit_mod, newdata = test, type = "response")

predlogit_mod <- ifelse(predlogit_mod > 0.5, yes = "1", no = "0")

confusion(predlogit_mod, test) 
## $confusion_matrix
##              Real   
## 1 Prediction    0  1
## 2          0   47  6
## 3          1    9 56
## 
## $accuracy
## [1] 0.8728814
## 
## $error
## [1] 0.1271186

El error de prueba es del 12.7%

G

Perform KNN on the training data, with several values of K ,in order to predict mpg01. Use only the variables that seemed most associated with mpg01 in (b). What test errors do you obtain? Which value of K seems to perform the best on this data set?

library(class)
predknn_mod <- as.character(knn(train[, -c(1, 6, 8, 9), drop = FALSE], test[, -c(1, 6, 8, 9), drop = FALSE], cl = train$mpg01, k = 3))

confusion(predknn_mod, test) 
## $confusion_matrix
##              Real   
## 1 Prediction    0  1
## 2          0   46 12
## 3          1   10 50
## 
## $accuracy
## [1] 0.8135593
## 
## $error
## [1] 0.1864407

El error es del 18.6% ,con K=3

Punto 12

This problem involves writing functions

A

Write a function, Power(), that prints out the result of raising 2 to the 3rd power. In other words, your function should compute 2^3 and print out the results. Hint: Recal l that x^a raises x to the power a.Usetheprint() function to output the result.

Power <- function(x) print(2^3)
Power()
## [1] 8

B

Create a new function, Power2(), that allows you to pass any two numbers, x and a, and prints out the value of x^a.Youcan do this by beginning your function with the line Power2=function (x,a){ You should be able to call your function by entering, for instance, Power2(3,8) on the command line. This should output the value of 3^8,namely, 6, 561.

Power2 <- function(x, a) print(x^a)
Power2(3, 8)
## [1] 6561

C

Using the Power2() function that you just wrote, compute 103,81, and 131^3

Power2(10, 3)
## [1] 1000
Power2(8, 17)
## [1] 2.2518e+15
Power2(131, 3)
## [1] 2248091

D

Now create a new function, Power3(), that actually returns the result x^a as an R object, rather than simply printing it to the screen. That is, if you store the value x^a in an object called result within your function, then you can simply return() this result, using the following line: return() return(result) The line above should be the last line in your function, before the } symbol.

Power3 <- function(x, a) return(x^a)

E

Now using the Power3() function, create a plot of f (x)=x^2. The x-axis should display a range of integers from 1 to 10, and the y-axis should display x^2. Label the axes appropriately, and use an appropriate title for the figure. Consider displaying either the x-axis, the y-axis, or both on the log-scale. You can do this by using log=‘‘x’’, log=‘‘y’’,orlog=‘‘xy’’ as arguments to the plot() function.

x <- 1:10
y <- Power3(x, 2)
plot(x, y, pch = 19, las = 1, xaxt = "n", 
     main = bquote(f(x) == x^2 ~ with ~ 1 <= ~ x <= 10))
axis(1, at = 1:10, labels = 1:10)

F

Create a function, PlotPower(), that allows you to create a plot of x against x^a for a fixed a and for a range of values of x.For instance, if you call PlotPower (1:10,3) then a plot should be created with an x-axis taking on values 1, 2,…,10, and a y-axis taking on values 13,23,…,10^3

PlotPower <- function(x, a){
  y <- x^a
  plot(x, y, pch = 19, las = 1, xaxt = "n", yaxt = "n", 
       main = bquote(f(x) == x^ ~ .(a) ~ with ~ .(x[1]) <= ~ x <= .(x[length(x)])))
  axis(1, at = x, labels = x)
  axis(2, at = x^a, labels = x^a, las = 1)
}

PlotPower(1:10, 3)

Sesion 8.4

Punto 7

In the lab, we applied random forests to the Boston data using mtry=6 and using ntree=25 and ntree=500. Create a plot displaying the test error resulting from random forests on this data set for a more comprehensive range of values for mtry and ntree. You can model your plot after Figure 8.10. Describe the results obtained.

library(randomForest)
library(MASS)
library(tidyverse)
library(ggplot2)
Boston
##         crim    zn indus chas    nox    rm   age     dis rad tax ptratio  black
## 1    0.00632  18.0  2.31    0 0.5380 6.575  65.2  4.0900   1 296    15.3 396.90
## 2    0.02731   0.0  7.07    0 0.4690 6.421  78.9  4.9671   2 242    17.8 396.90
## 3    0.02729   0.0  7.07    0 0.4690 7.185  61.1  4.9671   2 242    17.8 392.83
## 4    0.03237   0.0  2.18    0 0.4580 6.998  45.8  6.0622   3 222    18.7 394.63
## 5    0.06905   0.0  2.18    0 0.4580 7.147  54.2  6.0622   3 222    18.7 396.90
## 6    0.02985   0.0  2.18    0 0.4580 6.430  58.7  6.0622   3 222    18.7 394.12
## 7    0.08829  12.5  7.87    0 0.5240 6.012  66.6  5.5605   5 311    15.2 395.60
## 8    0.14455  12.5  7.87    0 0.5240 6.172  96.1  5.9505   5 311    15.2 396.90
## 9    0.21124  12.5  7.87    0 0.5240 5.631 100.0  6.0821   5 311    15.2 386.63
## 10   0.17004  12.5  7.87    0 0.5240 6.004  85.9  6.5921   5 311    15.2 386.71
## 11   0.22489  12.5  7.87    0 0.5240 6.377  94.3  6.3467   5 311    15.2 392.52
## 12   0.11747  12.5  7.87    0 0.5240 6.009  82.9  6.2267   5 311    15.2 396.90
## 13   0.09378  12.5  7.87    0 0.5240 5.889  39.0  5.4509   5 311    15.2 390.50
## 14   0.62976   0.0  8.14    0 0.5380 5.949  61.8  4.7075   4 307    21.0 396.90
## 15   0.63796   0.0  8.14    0 0.5380 6.096  84.5  4.4619   4 307    21.0 380.02
## 16   0.62739   0.0  8.14    0 0.5380 5.834  56.5  4.4986   4 307    21.0 395.62
## 17   1.05393   0.0  8.14    0 0.5380 5.935  29.3  4.4986   4 307    21.0 386.85
## 18   0.78420   0.0  8.14    0 0.5380 5.990  81.7  4.2579   4 307    21.0 386.75
## 19   0.80271   0.0  8.14    0 0.5380 5.456  36.6  3.7965   4 307    21.0 288.99
## 20   0.72580   0.0  8.14    0 0.5380 5.727  69.5  3.7965   4 307    21.0 390.95
## 21   1.25179   0.0  8.14    0 0.5380 5.570  98.1  3.7979   4 307    21.0 376.57
## 22   0.85204   0.0  8.14    0 0.5380 5.965  89.2  4.0123   4 307    21.0 392.53
## 23   1.23247   0.0  8.14    0 0.5380 6.142  91.7  3.9769   4 307    21.0 396.90
## 24   0.98843   0.0  8.14    0 0.5380 5.813 100.0  4.0952   4 307    21.0 394.54
## 25   0.75026   0.0  8.14    0 0.5380 5.924  94.1  4.3996   4 307    21.0 394.33
## 26   0.84054   0.0  8.14    0 0.5380 5.599  85.7  4.4546   4 307    21.0 303.42
## 27   0.67191   0.0  8.14    0 0.5380 5.813  90.3  4.6820   4 307    21.0 376.88
## 28   0.95577   0.0  8.14    0 0.5380 6.047  88.8  4.4534   4 307    21.0 306.38
## 29   0.77299   0.0  8.14    0 0.5380 6.495  94.4  4.4547   4 307    21.0 387.94
## 30   1.00245   0.0  8.14    0 0.5380 6.674  87.3  4.2390   4 307    21.0 380.23
## 31   1.13081   0.0  8.14    0 0.5380 5.713  94.1  4.2330   4 307    21.0 360.17
## 32   1.35472   0.0  8.14    0 0.5380 6.072 100.0  4.1750   4 307    21.0 376.73
## 33   1.38799   0.0  8.14    0 0.5380 5.950  82.0  3.9900   4 307    21.0 232.60
## 34   1.15172   0.0  8.14    0 0.5380 5.701  95.0  3.7872   4 307    21.0 358.77
## 35   1.61282   0.0  8.14    0 0.5380 6.096  96.9  3.7598   4 307    21.0 248.31
## 36   0.06417   0.0  5.96    0 0.4990 5.933  68.2  3.3603   5 279    19.2 396.90
## 37   0.09744   0.0  5.96    0 0.4990 5.841  61.4  3.3779   5 279    19.2 377.56
## 38   0.08014   0.0  5.96    0 0.4990 5.850  41.5  3.9342   5 279    19.2 396.90
## 39   0.17505   0.0  5.96    0 0.4990 5.966  30.2  3.8473   5 279    19.2 393.43
## 40   0.02763  75.0  2.95    0 0.4280 6.595  21.8  5.4011   3 252    18.3 395.63
## 41   0.03359  75.0  2.95    0 0.4280 7.024  15.8  5.4011   3 252    18.3 395.62
## 42   0.12744   0.0  6.91    0 0.4480 6.770   2.9  5.7209   3 233    17.9 385.41
## 43   0.14150   0.0  6.91    0 0.4480 6.169   6.6  5.7209   3 233    17.9 383.37
## 44   0.15936   0.0  6.91    0 0.4480 6.211   6.5  5.7209   3 233    17.9 394.46
## 45   0.12269   0.0  6.91    0 0.4480 6.069  40.0  5.7209   3 233    17.9 389.39
## 46   0.17142   0.0  6.91    0 0.4480 5.682  33.8  5.1004   3 233    17.9 396.90
## 47   0.18836   0.0  6.91    0 0.4480 5.786  33.3  5.1004   3 233    17.9 396.90
## 48   0.22927   0.0  6.91    0 0.4480 6.030  85.5  5.6894   3 233    17.9 392.74
## 49   0.25387   0.0  6.91    0 0.4480 5.399  95.3  5.8700   3 233    17.9 396.90
## 50   0.21977   0.0  6.91    0 0.4480 5.602  62.0  6.0877   3 233    17.9 396.90
## 51   0.08873  21.0  5.64    0 0.4390 5.963  45.7  6.8147   4 243    16.8 395.56
## 52   0.04337  21.0  5.64    0 0.4390 6.115  63.0  6.8147   4 243    16.8 393.97
## 53   0.05360  21.0  5.64    0 0.4390 6.511  21.1  6.8147   4 243    16.8 396.90
## 54   0.04981  21.0  5.64    0 0.4390 5.998  21.4  6.8147   4 243    16.8 396.90
## 55   0.01360  75.0  4.00    0 0.4100 5.888  47.6  7.3197   3 469    21.1 396.90
## 56   0.01311  90.0  1.22    0 0.4030 7.249  21.9  8.6966   5 226    17.9 395.93
## 57   0.02055  85.0  0.74    0 0.4100 6.383  35.7  9.1876   2 313    17.3 396.90
## 58   0.01432 100.0  1.32    0 0.4110 6.816  40.5  8.3248   5 256    15.1 392.90
## 59   0.15445  25.0  5.13    0 0.4530 6.145  29.2  7.8148   8 284    19.7 390.68
## 60   0.10328  25.0  5.13    0 0.4530 5.927  47.2  6.9320   8 284    19.7 396.90
## 61   0.14932  25.0  5.13    0 0.4530 5.741  66.2  7.2254   8 284    19.7 395.11
## 62   0.17171  25.0  5.13    0 0.4530 5.966  93.4  6.8185   8 284    19.7 378.08
## 63   0.11027  25.0  5.13    0 0.4530 6.456  67.8  7.2255   8 284    19.7 396.90
## 64   0.12650  25.0  5.13    0 0.4530 6.762  43.4  7.9809   8 284    19.7 395.58
## 65   0.01951  17.5  1.38    0 0.4161 7.104  59.5  9.2229   3 216    18.6 393.24
## 66   0.03584  80.0  3.37    0 0.3980 6.290  17.8  6.6115   4 337    16.1 396.90
## 67   0.04379  80.0  3.37    0 0.3980 5.787  31.1  6.6115   4 337    16.1 396.90
## 68   0.05789  12.5  6.07    0 0.4090 5.878  21.4  6.4980   4 345    18.9 396.21
## 69   0.13554  12.5  6.07    0 0.4090 5.594  36.8  6.4980   4 345    18.9 396.90
## 70   0.12816  12.5  6.07    0 0.4090 5.885  33.0  6.4980   4 345    18.9 396.90
## 71   0.08826   0.0 10.81    0 0.4130 6.417   6.6  5.2873   4 305    19.2 383.73
## 72   0.15876   0.0 10.81    0 0.4130 5.961  17.5  5.2873   4 305    19.2 376.94
## 73   0.09164   0.0 10.81    0 0.4130 6.065   7.8  5.2873   4 305    19.2 390.91
## 74   0.19539   0.0 10.81    0 0.4130 6.245   6.2  5.2873   4 305    19.2 377.17
## 75   0.07896   0.0 12.83    0 0.4370 6.273   6.0  4.2515   5 398    18.7 394.92
## 76   0.09512   0.0 12.83    0 0.4370 6.286  45.0  4.5026   5 398    18.7 383.23
## 77   0.10153   0.0 12.83    0 0.4370 6.279  74.5  4.0522   5 398    18.7 373.66
## 78   0.08707   0.0 12.83    0 0.4370 6.140  45.8  4.0905   5 398    18.7 386.96
## 79   0.05646   0.0 12.83    0 0.4370 6.232  53.7  5.0141   5 398    18.7 386.40
## 80   0.08387   0.0 12.83    0 0.4370 5.874  36.6  4.5026   5 398    18.7 396.06
## 81   0.04113  25.0  4.86    0 0.4260 6.727  33.5  5.4007   4 281    19.0 396.90
## 82   0.04462  25.0  4.86    0 0.4260 6.619  70.4  5.4007   4 281    19.0 395.63
## 83   0.03659  25.0  4.86    0 0.4260 6.302  32.2  5.4007   4 281    19.0 396.90
## 84   0.03551  25.0  4.86    0 0.4260 6.167  46.7  5.4007   4 281    19.0 390.64
## 85   0.05059   0.0  4.49    0 0.4490 6.389  48.0  4.7794   3 247    18.5 396.90
## 86   0.05735   0.0  4.49    0 0.4490 6.630  56.1  4.4377   3 247    18.5 392.30
## 87   0.05188   0.0  4.49    0 0.4490 6.015  45.1  4.4272   3 247    18.5 395.99
## 88   0.07151   0.0  4.49    0 0.4490 6.121  56.8  3.7476   3 247    18.5 395.15
## 89   0.05660   0.0  3.41    0 0.4890 7.007  86.3  3.4217   2 270    17.8 396.90
## 90   0.05302   0.0  3.41    0 0.4890 7.079  63.1  3.4145   2 270    17.8 396.06
## 91   0.04684   0.0  3.41    0 0.4890 6.417  66.1  3.0923   2 270    17.8 392.18
## 92   0.03932   0.0  3.41    0 0.4890 6.405  73.9  3.0921   2 270    17.8 393.55
## 93   0.04203  28.0 15.04    0 0.4640 6.442  53.6  3.6659   4 270    18.2 395.01
## 94   0.02875  28.0 15.04    0 0.4640 6.211  28.9  3.6659   4 270    18.2 396.33
## 95   0.04294  28.0 15.04    0 0.4640 6.249  77.3  3.6150   4 270    18.2 396.90
## 96   0.12204   0.0  2.89    0 0.4450 6.625  57.8  3.4952   2 276    18.0 357.98
## 97   0.11504   0.0  2.89    0 0.4450 6.163  69.6  3.4952   2 276    18.0 391.83
## 98   0.12083   0.0  2.89    0 0.4450 8.069  76.0  3.4952   2 276    18.0 396.90
## 99   0.08187   0.0  2.89    0 0.4450 7.820  36.9  3.4952   2 276    18.0 393.53
## 100  0.06860   0.0  2.89    0 0.4450 7.416  62.5  3.4952   2 276    18.0 396.90
## 101  0.14866   0.0  8.56    0 0.5200 6.727  79.9  2.7778   5 384    20.9 394.76
## 102  0.11432   0.0  8.56    0 0.5200 6.781  71.3  2.8561   5 384    20.9 395.58
## 103  0.22876   0.0  8.56    0 0.5200 6.405  85.4  2.7147   5 384    20.9  70.80
## 104  0.21161   0.0  8.56    0 0.5200 6.137  87.4  2.7147   5 384    20.9 394.47
## 105  0.13960   0.0  8.56    0 0.5200 6.167  90.0  2.4210   5 384    20.9 392.69
## 106  0.13262   0.0  8.56    0 0.5200 5.851  96.7  2.1069   5 384    20.9 394.05
## 107  0.17120   0.0  8.56    0 0.5200 5.836  91.9  2.2110   5 384    20.9 395.67
## 108  0.13117   0.0  8.56    0 0.5200 6.127  85.2  2.1224   5 384    20.9 387.69
## 109  0.12802   0.0  8.56    0 0.5200 6.474  97.1  2.4329   5 384    20.9 395.24
## 110  0.26363   0.0  8.56    0 0.5200 6.229  91.2  2.5451   5 384    20.9 391.23
## 111  0.10793   0.0  8.56    0 0.5200 6.195  54.4  2.7778   5 384    20.9 393.49
## 112  0.10084   0.0 10.01    0 0.5470 6.715  81.6  2.6775   6 432    17.8 395.59
## 113  0.12329   0.0 10.01    0 0.5470 5.913  92.9  2.3534   6 432    17.8 394.95
## 114  0.22212   0.0 10.01    0 0.5470 6.092  95.4  2.5480   6 432    17.8 396.90
## 115  0.14231   0.0 10.01    0 0.5470 6.254  84.2  2.2565   6 432    17.8 388.74
## 116  0.17134   0.0 10.01    0 0.5470 5.928  88.2  2.4631   6 432    17.8 344.91
## 117  0.13158   0.0 10.01    0 0.5470 6.176  72.5  2.7301   6 432    17.8 393.30
## 118  0.15098   0.0 10.01    0 0.5470 6.021  82.6  2.7474   6 432    17.8 394.51
## 119  0.13058   0.0 10.01    0 0.5470 5.872  73.1  2.4775   6 432    17.8 338.63
## 120  0.14476   0.0 10.01    0 0.5470 5.731  65.2  2.7592   6 432    17.8 391.50
## 121  0.06899   0.0 25.65    0 0.5810 5.870  69.7  2.2577   2 188    19.1 389.15
## 122  0.07165   0.0 25.65    0 0.5810 6.004  84.1  2.1974   2 188    19.1 377.67
## 123  0.09299   0.0 25.65    0 0.5810 5.961  92.9  2.0869   2 188    19.1 378.09
## 124  0.15038   0.0 25.65    0 0.5810 5.856  97.0  1.9444   2 188    19.1 370.31
## 125  0.09849   0.0 25.65    0 0.5810 5.879  95.8  2.0063   2 188    19.1 379.38
## 126  0.16902   0.0 25.65    0 0.5810 5.986  88.4  1.9929   2 188    19.1 385.02
## 127  0.38735   0.0 25.65    0 0.5810 5.613  95.6  1.7572   2 188    19.1 359.29
## 128  0.25915   0.0 21.89    0 0.6240 5.693  96.0  1.7883   4 437    21.2 392.11
## 129  0.32543   0.0 21.89    0 0.6240 6.431  98.8  1.8125   4 437    21.2 396.90
## 130  0.88125   0.0 21.89    0 0.6240 5.637  94.7  1.9799   4 437    21.2 396.90
## 131  0.34006   0.0 21.89    0 0.6240 6.458  98.9  2.1185   4 437    21.2 395.04
## 132  1.19294   0.0 21.89    0 0.6240 6.326  97.7  2.2710   4 437    21.2 396.90
## 133  0.59005   0.0 21.89    0 0.6240 6.372  97.9  2.3274   4 437    21.2 385.76
## 134  0.32982   0.0 21.89    0 0.6240 5.822  95.4  2.4699   4 437    21.2 388.69
## 135  0.97617   0.0 21.89    0 0.6240 5.757  98.4  2.3460   4 437    21.2 262.76
## 136  0.55778   0.0 21.89    0 0.6240 6.335  98.2  2.1107   4 437    21.2 394.67
## 137  0.32264   0.0 21.89    0 0.6240 5.942  93.5  1.9669   4 437    21.2 378.25
## 138  0.35233   0.0 21.89    0 0.6240 6.454  98.4  1.8498   4 437    21.2 394.08
## 139  0.24980   0.0 21.89    0 0.6240 5.857  98.2  1.6686   4 437    21.2 392.04
## 140  0.54452   0.0 21.89    0 0.6240 6.151  97.9  1.6687   4 437    21.2 396.90
## 141  0.29090   0.0 21.89    0 0.6240 6.174  93.6  1.6119   4 437    21.2 388.08
## 142  1.62864   0.0 21.89    0 0.6240 5.019 100.0  1.4394   4 437    21.2 396.90
## 143  3.32105   0.0 19.58    1 0.8710 5.403 100.0  1.3216   5 403    14.7 396.90
## 144  4.09740   0.0 19.58    0 0.8710 5.468 100.0  1.4118   5 403    14.7 396.90
## 145  2.77974   0.0 19.58    0 0.8710 4.903  97.8  1.3459   5 403    14.7 396.90
## 146  2.37934   0.0 19.58    0 0.8710 6.130 100.0  1.4191   5 403    14.7 172.91
## 147  2.15505   0.0 19.58    0 0.8710 5.628 100.0  1.5166   5 403    14.7 169.27
## 148  2.36862   0.0 19.58    0 0.8710 4.926  95.7  1.4608   5 403    14.7 391.71
## 149  2.33099   0.0 19.58    0 0.8710 5.186  93.8  1.5296   5 403    14.7 356.99
## 150  2.73397   0.0 19.58    0 0.8710 5.597  94.9  1.5257   5 403    14.7 351.85
## 151  1.65660   0.0 19.58    0 0.8710 6.122  97.3  1.6180   5 403    14.7 372.80
## 152  1.49632   0.0 19.58    0 0.8710 5.404 100.0  1.5916   5 403    14.7 341.60
## 153  1.12658   0.0 19.58    1 0.8710 5.012  88.0  1.6102   5 403    14.7 343.28
## 154  2.14918   0.0 19.58    0 0.8710 5.709  98.5  1.6232   5 403    14.7 261.95
## 155  1.41385   0.0 19.58    1 0.8710 6.129  96.0  1.7494   5 403    14.7 321.02
## 156  3.53501   0.0 19.58    1 0.8710 6.152  82.6  1.7455   5 403    14.7  88.01
## 157  2.44668   0.0 19.58    0 0.8710 5.272  94.0  1.7364   5 403    14.7  88.63
## 158  1.22358   0.0 19.58    0 0.6050 6.943  97.4  1.8773   5 403    14.7 363.43
## 159  1.34284   0.0 19.58    0 0.6050 6.066 100.0  1.7573   5 403    14.7 353.89
## 160  1.42502   0.0 19.58    0 0.8710 6.510 100.0  1.7659   5 403    14.7 364.31
## 161  1.27346   0.0 19.58    1 0.6050 6.250  92.6  1.7984   5 403    14.7 338.92
## 162  1.46336   0.0 19.58    0 0.6050 7.489  90.8  1.9709   5 403    14.7 374.43
## 163  1.83377   0.0 19.58    1 0.6050 7.802  98.2  2.0407   5 403    14.7 389.61
## 164  1.51902   0.0 19.58    1 0.6050 8.375  93.9  2.1620   5 403    14.7 388.45
## 165  2.24236   0.0 19.58    0 0.6050 5.854  91.8  2.4220   5 403    14.7 395.11
## 166  2.92400   0.0 19.58    0 0.6050 6.101  93.0  2.2834   5 403    14.7 240.16
## 167  2.01019   0.0 19.58    0 0.6050 7.929  96.2  2.0459   5 403    14.7 369.30
## 168  1.80028   0.0 19.58    0 0.6050 5.877  79.2  2.4259   5 403    14.7 227.61
## 169  2.30040   0.0 19.58    0 0.6050 6.319  96.1  2.1000   5 403    14.7 297.09
## 170  2.44953   0.0 19.58    0 0.6050 6.402  95.2  2.2625   5 403    14.7 330.04
## 171  1.20742   0.0 19.58    0 0.6050 5.875  94.6  2.4259   5 403    14.7 292.29
## 172  2.31390   0.0 19.58    0 0.6050 5.880  97.3  2.3887   5 403    14.7 348.13
## 173  0.13914   0.0  4.05    0 0.5100 5.572  88.5  2.5961   5 296    16.6 396.90
## 174  0.09178   0.0  4.05    0 0.5100 6.416  84.1  2.6463   5 296    16.6 395.50
## 175  0.08447   0.0  4.05    0 0.5100 5.859  68.7  2.7019   5 296    16.6 393.23
## 176  0.06664   0.0  4.05    0 0.5100 6.546  33.1  3.1323   5 296    16.6 390.96
## 177  0.07022   0.0  4.05    0 0.5100 6.020  47.2  3.5549   5 296    16.6 393.23
## 178  0.05425   0.0  4.05    0 0.5100 6.315  73.4  3.3175   5 296    16.6 395.60
## 179  0.06642   0.0  4.05    0 0.5100 6.860  74.4  2.9153   5 296    16.6 391.27
## 180  0.05780   0.0  2.46    0 0.4880 6.980  58.4  2.8290   3 193    17.8 396.90
## 181  0.06588   0.0  2.46    0 0.4880 7.765  83.3  2.7410   3 193    17.8 395.56
## 182  0.06888   0.0  2.46    0 0.4880 6.144  62.2  2.5979   3 193    17.8 396.90
## 183  0.09103   0.0  2.46    0 0.4880 7.155  92.2  2.7006   3 193    17.8 394.12
## 184  0.10008   0.0  2.46    0 0.4880 6.563  95.6  2.8470   3 193    17.8 396.90
## 185  0.08308   0.0  2.46    0 0.4880 5.604  89.8  2.9879   3 193    17.8 391.00
## 186  0.06047   0.0  2.46    0 0.4880 6.153  68.8  3.2797   3 193    17.8 387.11
## 187  0.05602   0.0  2.46    0 0.4880 7.831  53.6  3.1992   3 193    17.8 392.63
## 188  0.07875  45.0  3.44    0 0.4370 6.782  41.1  3.7886   5 398    15.2 393.87
## 189  0.12579  45.0  3.44    0 0.4370 6.556  29.1  4.5667   5 398    15.2 382.84
## 190  0.08370  45.0  3.44    0 0.4370 7.185  38.9  4.5667   5 398    15.2 396.90
## 191  0.09068  45.0  3.44    0 0.4370 6.951  21.5  6.4798   5 398    15.2 377.68
## 192  0.06911  45.0  3.44    0 0.4370 6.739  30.8  6.4798   5 398    15.2 389.71
## 193  0.08664  45.0  3.44    0 0.4370 7.178  26.3  6.4798   5 398    15.2 390.49
## 194  0.02187  60.0  2.93    0 0.4010 6.800   9.9  6.2196   1 265    15.6 393.37
## 195  0.01439  60.0  2.93    0 0.4010 6.604  18.8  6.2196   1 265    15.6 376.70
## 196  0.01381  80.0  0.46    0 0.4220 7.875  32.0  5.6484   4 255    14.4 394.23
## 197  0.04011  80.0  1.52    0 0.4040 7.287  34.1  7.3090   2 329    12.6 396.90
## 198  0.04666  80.0  1.52    0 0.4040 7.107  36.6  7.3090   2 329    12.6 354.31
## 199  0.03768  80.0  1.52    0 0.4040 7.274  38.3  7.3090   2 329    12.6 392.20
## 200  0.03150  95.0  1.47    0 0.4030 6.975  15.3  7.6534   3 402    17.0 396.90
## 201  0.01778  95.0  1.47    0 0.4030 7.135  13.9  7.6534   3 402    17.0 384.30
## 202  0.03445  82.5  2.03    0 0.4150 6.162  38.4  6.2700   2 348    14.7 393.77
## 203  0.02177  82.5  2.03    0 0.4150 7.610  15.7  6.2700   2 348    14.7 395.38
## 204  0.03510  95.0  2.68    0 0.4161 7.853  33.2  5.1180   4 224    14.7 392.78
## 205  0.02009  95.0  2.68    0 0.4161 8.034  31.9  5.1180   4 224    14.7 390.55
## 206  0.13642   0.0 10.59    0 0.4890 5.891  22.3  3.9454   4 277    18.6 396.90
## 207  0.22969   0.0 10.59    0 0.4890 6.326  52.5  4.3549   4 277    18.6 394.87
## 208  0.25199   0.0 10.59    0 0.4890 5.783  72.7  4.3549   4 277    18.6 389.43
## 209  0.13587   0.0 10.59    1 0.4890 6.064  59.1  4.2392   4 277    18.6 381.32
## 210  0.43571   0.0 10.59    1 0.4890 5.344 100.0  3.8750   4 277    18.6 396.90
## 211  0.17446   0.0 10.59    1 0.4890 5.960  92.1  3.8771   4 277    18.6 393.25
## 212  0.37578   0.0 10.59    1 0.4890 5.404  88.6  3.6650   4 277    18.6 395.24
## 213  0.21719   0.0 10.59    1 0.4890 5.807  53.8  3.6526   4 277    18.6 390.94
## 214  0.14052   0.0 10.59    0 0.4890 6.375  32.3  3.9454   4 277    18.6 385.81
## 215  0.28955   0.0 10.59    0 0.4890 5.412   9.8  3.5875   4 277    18.6 348.93
## 216  0.19802   0.0 10.59    0 0.4890 6.182  42.4  3.9454   4 277    18.6 393.63
## 217  0.04560   0.0 13.89    1 0.5500 5.888  56.0  3.1121   5 276    16.4 392.80
## 218  0.07013   0.0 13.89    0 0.5500 6.642  85.1  3.4211   5 276    16.4 392.78
## 219  0.11069   0.0 13.89    1 0.5500 5.951  93.8  2.8893   5 276    16.4 396.90
## 220  0.11425   0.0 13.89    1 0.5500 6.373  92.4  3.3633   5 276    16.4 393.74
## 221  0.35809   0.0  6.20    1 0.5070 6.951  88.5  2.8617   8 307    17.4 391.70
## 222  0.40771   0.0  6.20    1 0.5070 6.164  91.3  3.0480   8 307    17.4 395.24
## 223  0.62356   0.0  6.20    1 0.5070 6.879  77.7  3.2721   8 307    17.4 390.39
## 224  0.61470   0.0  6.20    0 0.5070 6.618  80.8  3.2721   8 307    17.4 396.90
## 225  0.31533   0.0  6.20    0 0.5040 8.266  78.3  2.8944   8 307    17.4 385.05
## 226  0.52693   0.0  6.20    0 0.5040 8.725  83.0  2.8944   8 307    17.4 382.00
## 227  0.38214   0.0  6.20    0 0.5040 8.040  86.5  3.2157   8 307    17.4 387.38
## 228  0.41238   0.0  6.20    0 0.5040 7.163  79.9  3.2157   8 307    17.4 372.08
## 229  0.29819   0.0  6.20    0 0.5040 7.686  17.0  3.3751   8 307    17.4 377.51
## 230  0.44178   0.0  6.20    0 0.5040 6.552  21.4  3.3751   8 307    17.4 380.34
## 231  0.53700   0.0  6.20    0 0.5040 5.981  68.1  3.6715   8 307    17.4 378.35
## 232  0.46296   0.0  6.20    0 0.5040 7.412  76.9  3.6715   8 307    17.4 376.14
## 233  0.57529   0.0  6.20    0 0.5070 8.337  73.3  3.8384   8 307    17.4 385.91
## 234  0.33147   0.0  6.20    0 0.5070 8.247  70.4  3.6519   8 307    17.4 378.95
## 235  0.44791   0.0  6.20    1 0.5070 6.726  66.5  3.6519   8 307    17.4 360.20
## 236  0.33045   0.0  6.20    0 0.5070 6.086  61.5  3.6519   8 307    17.4 376.75
## 237  0.52058   0.0  6.20    1 0.5070 6.631  76.5  4.1480   8 307    17.4 388.45
## 238  0.51183   0.0  6.20    0 0.5070 7.358  71.6  4.1480   8 307    17.4 390.07
## 239  0.08244  30.0  4.93    0 0.4280 6.481  18.5  6.1899   6 300    16.6 379.41
## 240  0.09252  30.0  4.93    0 0.4280 6.606  42.2  6.1899   6 300    16.6 383.78
## 241  0.11329  30.0  4.93    0 0.4280 6.897  54.3  6.3361   6 300    16.6 391.25
## 242  0.10612  30.0  4.93    0 0.4280 6.095  65.1  6.3361   6 300    16.6 394.62
## 243  0.10290  30.0  4.93    0 0.4280 6.358  52.9  7.0355   6 300    16.6 372.75
## 244  0.12757  30.0  4.93    0 0.4280 6.393   7.8  7.0355   6 300    16.6 374.71
## 245  0.20608  22.0  5.86    0 0.4310 5.593  76.5  7.9549   7 330    19.1 372.49
## 246  0.19133  22.0  5.86    0 0.4310 5.605  70.2  7.9549   7 330    19.1 389.13
## 247  0.33983  22.0  5.86    0 0.4310 6.108  34.9  8.0555   7 330    19.1 390.18
## 248  0.19657  22.0  5.86    0 0.4310 6.226  79.2  8.0555   7 330    19.1 376.14
## 249  0.16439  22.0  5.86    0 0.4310 6.433  49.1  7.8265   7 330    19.1 374.71
## 250  0.19073  22.0  5.86    0 0.4310 6.718  17.5  7.8265   7 330    19.1 393.74
## 251  0.14030  22.0  5.86    0 0.4310 6.487  13.0  7.3967   7 330    19.1 396.28
## 252  0.21409  22.0  5.86    0 0.4310 6.438   8.9  7.3967   7 330    19.1 377.07
## 253  0.08221  22.0  5.86    0 0.4310 6.957   6.8  8.9067   7 330    19.1 386.09
## 254  0.36894  22.0  5.86    0 0.4310 8.259   8.4  8.9067   7 330    19.1 396.90
## 255  0.04819  80.0  3.64    0 0.3920 6.108  32.0  9.2203   1 315    16.4 392.89
## 256  0.03548  80.0  3.64    0 0.3920 5.876  19.1  9.2203   1 315    16.4 395.18
## 257  0.01538  90.0  3.75    0 0.3940 7.454  34.2  6.3361   3 244    15.9 386.34
## 258  0.61154  20.0  3.97    0 0.6470 8.704  86.9  1.8010   5 264    13.0 389.70
## 259  0.66351  20.0  3.97    0 0.6470 7.333 100.0  1.8946   5 264    13.0 383.29
## 260  0.65665  20.0  3.97    0 0.6470 6.842 100.0  2.0107   5 264    13.0 391.93
## 261  0.54011  20.0  3.97    0 0.6470 7.203  81.8  2.1121   5 264    13.0 392.80
## 262  0.53412  20.0  3.97    0 0.6470 7.520  89.4  2.1398   5 264    13.0 388.37
## 263  0.52014  20.0  3.97    0 0.6470 8.398  91.5  2.2885   5 264    13.0 386.86
## 264  0.82526  20.0  3.97    0 0.6470 7.327  94.5  2.0788   5 264    13.0 393.42
## 265  0.55007  20.0  3.97    0 0.6470 7.206  91.6  1.9301   5 264    13.0 387.89
## 266  0.76162  20.0  3.97    0 0.6470 5.560  62.8  1.9865   5 264    13.0 392.40
## 267  0.78570  20.0  3.97    0 0.6470 7.014  84.6  2.1329   5 264    13.0 384.07
## 268  0.57834  20.0  3.97    0 0.5750 8.297  67.0  2.4216   5 264    13.0 384.54
## 269  0.54050  20.0  3.97    0 0.5750 7.470  52.6  2.8720   5 264    13.0 390.30
## 270  0.09065  20.0  6.96    1 0.4640 5.920  61.5  3.9175   3 223    18.6 391.34
## 271  0.29916  20.0  6.96    0 0.4640 5.856  42.1  4.4290   3 223    18.6 388.65
## 272  0.16211  20.0  6.96    0 0.4640 6.240  16.3  4.4290   3 223    18.6 396.90
## 273  0.11460  20.0  6.96    0 0.4640 6.538  58.7  3.9175   3 223    18.6 394.96
## 274  0.22188  20.0  6.96    1 0.4640 7.691  51.8  4.3665   3 223    18.6 390.77
## 275  0.05644  40.0  6.41    1 0.4470 6.758  32.9  4.0776   4 254    17.6 396.90
## 276  0.09604  40.0  6.41    0 0.4470 6.854  42.8  4.2673   4 254    17.6 396.90
## 277  0.10469  40.0  6.41    1 0.4470 7.267  49.0  4.7872   4 254    17.6 389.25
## 278  0.06127  40.0  6.41    1 0.4470 6.826  27.6  4.8628   4 254    17.6 393.45
## 279  0.07978  40.0  6.41    0 0.4470 6.482  32.1  4.1403   4 254    17.6 396.90
## 280  0.21038  20.0  3.33    0 0.4429 6.812  32.2  4.1007   5 216    14.9 396.90
## 281  0.03578  20.0  3.33    0 0.4429 7.820  64.5  4.6947   5 216    14.9 387.31
## 282  0.03705  20.0  3.33    0 0.4429 6.968  37.2  5.2447   5 216    14.9 392.23
## 283  0.06129  20.0  3.33    1 0.4429 7.645  49.7  5.2119   5 216    14.9 377.07
## 284  0.01501  90.0  1.21    1 0.4010 7.923  24.8  5.8850   1 198    13.6 395.52
## 285  0.00906  90.0  2.97    0 0.4000 7.088  20.8  7.3073   1 285    15.3 394.72
## 286  0.01096  55.0  2.25    0 0.3890 6.453  31.9  7.3073   1 300    15.3 394.72
## 287  0.01965  80.0  1.76    0 0.3850 6.230  31.5  9.0892   1 241    18.2 341.60
## 288  0.03871  52.5  5.32    0 0.4050 6.209  31.3  7.3172   6 293    16.6 396.90
## 289  0.04590  52.5  5.32    0 0.4050 6.315  45.6  7.3172   6 293    16.6 396.90
## 290  0.04297  52.5  5.32    0 0.4050 6.565  22.9  7.3172   6 293    16.6 371.72
## 291  0.03502  80.0  4.95    0 0.4110 6.861  27.9  5.1167   4 245    19.2 396.90
## 292  0.07886  80.0  4.95    0 0.4110 7.148  27.7  5.1167   4 245    19.2 396.90
## 293  0.03615  80.0  4.95    0 0.4110 6.630  23.4  5.1167   4 245    19.2 396.90
## 294  0.08265   0.0 13.92    0 0.4370 6.127  18.4  5.5027   4 289    16.0 396.90
## 295  0.08199   0.0 13.92    0 0.4370 6.009  42.3  5.5027   4 289    16.0 396.90
## 296  0.12932   0.0 13.92    0 0.4370 6.678  31.1  5.9604   4 289    16.0 396.90
## 297  0.05372   0.0 13.92    0 0.4370 6.549  51.0  5.9604   4 289    16.0 392.85
## 298  0.14103   0.0 13.92    0 0.4370 5.790  58.0  6.3200   4 289    16.0 396.90
## 299  0.06466  70.0  2.24    0 0.4000 6.345  20.1  7.8278   5 358    14.8 368.24
## 300  0.05561  70.0  2.24    0 0.4000 7.041  10.0  7.8278   5 358    14.8 371.58
## 301  0.04417  70.0  2.24    0 0.4000 6.871  47.4  7.8278   5 358    14.8 390.86
## 302  0.03537  34.0  6.09    0 0.4330 6.590  40.4  5.4917   7 329    16.1 395.75
## 303  0.09266  34.0  6.09    0 0.4330 6.495  18.4  5.4917   7 329    16.1 383.61
## 304  0.10000  34.0  6.09    0 0.4330 6.982  17.7  5.4917   7 329    16.1 390.43
## 305  0.05515  33.0  2.18    0 0.4720 7.236  41.1  4.0220   7 222    18.4 393.68
## 306  0.05479  33.0  2.18    0 0.4720 6.616  58.1  3.3700   7 222    18.4 393.36
## 307  0.07503  33.0  2.18    0 0.4720 7.420  71.9  3.0992   7 222    18.4 396.90
## 308  0.04932  33.0  2.18    0 0.4720 6.849  70.3  3.1827   7 222    18.4 396.90
## 309  0.49298   0.0  9.90    0 0.5440 6.635  82.5  3.3175   4 304    18.4 396.90
## 310  0.34940   0.0  9.90    0 0.5440 5.972  76.7  3.1025   4 304    18.4 396.24
## 311  2.63548   0.0  9.90    0 0.5440 4.973  37.8  2.5194   4 304    18.4 350.45
## 312  0.79041   0.0  9.90    0 0.5440 6.122  52.8  2.6403   4 304    18.4 396.90
## 313  0.26169   0.0  9.90    0 0.5440 6.023  90.4  2.8340   4 304    18.4 396.30
## 314  0.26938   0.0  9.90    0 0.5440 6.266  82.8  3.2628   4 304    18.4 393.39
## 315  0.36920   0.0  9.90    0 0.5440 6.567  87.3  3.6023   4 304    18.4 395.69
## 316  0.25356   0.0  9.90    0 0.5440 5.705  77.7  3.9450   4 304    18.4 396.42
## 317  0.31827   0.0  9.90    0 0.5440 5.914  83.2  3.9986   4 304    18.4 390.70
## 318  0.24522   0.0  9.90    0 0.5440 5.782  71.7  4.0317   4 304    18.4 396.90
## 319  0.40202   0.0  9.90    0 0.5440 6.382  67.2  3.5325   4 304    18.4 395.21
## 320  0.47547   0.0  9.90    0 0.5440 6.113  58.8  4.0019   4 304    18.4 396.23
## 321  0.16760   0.0  7.38    0 0.4930 6.426  52.3  4.5404   5 287    19.6 396.90
## 322  0.18159   0.0  7.38    0 0.4930 6.376  54.3  4.5404   5 287    19.6 396.90
## 323  0.35114   0.0  7.38    0 0.4930 6.041  49.9  4.7211   5 287    19.6 396.90
## 324  0.28392   0.0  7.38    0 0.4930 5.708  74.3  4.7211   5 287    19.6 391.13
## 325  0.34109   0.0  7.38    0 0.4930 6.415  40.1  4.7211   5 287    19.6 396.90
## 326  0.19186   0.0  7.38    0 0.4930 6.431  14.7  5.4159   5 287    19.6 393.68
## 327  0.30347   0.0  7.38    0 0.4930 6.312  28.9  5.4159   5 287    19.6 396.90
## 328  0.24103   0.0  7.38    0 0.4930 6.083  43.7  5.4159   5 287    19.6 396.90
## 329  0.06617   0.0  3.24    0 0.4600 5.868  25.8  5.2146   4 430    16.9 382.44
## 330  0.06724   0.0  3.24    0 0.4600 6.333  17.2  5.2146   4 430    16.9 375.21
## 331  0.04544   0.0  3.24    0 0.4600 6.144  32.2  5.8736   4 430    16.9 368.57
## 332  0.05023  35.0  6.06    0 0.4379 5.706  28.4  6.6407   1 304    16.9 394.02
## 333  0.03466  35.0  6.06    0 0.4379 6.031  23.3  6.6407   1 304    16.9 362.25
## 334  0.05083   0.0  5.19    0 0.5150 6.316  38.1  6.4584   5 224    20.2 389.71
## 335  0.03738   0.0  5.19    0 0.5150 6.310  38.5  6.4584   5 224    20.2 389.40
## 336  0.03961   0.0  5.19    0 0.5150 6.037  34.5  5.9853   5 224    20.2 396.90
## 337  0.03427   0.0  5.19    0 0.5150 5.869  46.3  5.2311   5 224    20.2 396.90
## 338  0.03041   0.0  5.19    0 0.5150 5.895  59.6  5.6150   5 224    20.2 394.81
## 339  0.03306   0.0  5.19    0 0.5150 6.059  37.3  4.8122   5 224    20.2 396.14
## 340  0.05497   0.0  5.19    0 0.5150 5.985  45.4  4.8122   5 224    20.2 396.90
## 341  0.06151   0.0  5.19    0 0.5150 5.968  58.5  4.8122   5 224    20.2 396.90
## 342  0.01301  35.0  1.52    0 0.4420 7.241  49.3  7.0379   1 284    15.5 394.74
## 343  0.02498   0.0  1.89    0 0.5180 6.540  59.7  6.2669   1 422    15.9 389.96
## 344  0.02543  55.0  3.78    0 0.4840 6.696  56.4  5.7321   5 370    17.6 396.90
## 345  0.03049  55.0  3.78    0 0.4840 6.874  28.1  6.4654   5 370    17.6 387.97
## 346  0.03113   0.0  4.39    0 0.4420 6.014  48.5  8.0136   3 352    18.8 385.64
## 347  0.06162   0.0  4.39    0 0.4420 5.898  52.3  8.0136   3 352    18.8 364.61
## 348  0.01870  85.0  4.15    0 0.4290 6.516  27.7  8.5353   4 351    17.9 392.43
## 349  0.01501  80.0  2.01    0 0.4350 6.635  29.7  8.3440   4 280    17.0 390.94
## 350  0.02899  40.0  1.25    0 0.4290 6.939  34.5  8.7921   1 335    19.7 389.85
## 351  0.06211  40.0  1.25    0 0.4290 6.490  44.4  8.7921   1 335    19.7 396.90
## 352  0.07950  60.0  1.69    0 0.4110 6.579  35.9 10.7103   4 411    18.3 370.78
## 353  0.07244  60.0  1.69    0 0.4110 5.884  18.5 10.7103   4 411    18.3 392.33
## 354  0.01709  90.0  2.02    0 0.4100 6.728  36.1 12.1265   5 187    17.0 384.46
## 355  0.04301  80.0  1.91    0 0.4130 5.663  21.9 10.5857   4 334    22.0 382.80
## 356  0.10659  80.0  1.91    0 0.4130 5.936  19.5 10.5857   4 334    22.0 376.04
## 357  8.98296   0.0 18.10    1 0.7700 6.212  97.4  2.1222  24 666    20.2 377.73
## 358  3.84970   0.0 18.10    1 0.7700 6.395  91.0  2.5052  24 666    20.2 391.34
## 359  5.20177   0.0 18.10    1 0.7700 6.127  83.4  2.7227  24 666    20.2 395.43
## 360  4.26131   0.0 18.10    0 0.7700 6.112  81.3  2.5091  24 666    20.2 390.74
## 361  4.54192   0.0 18.10    0 0.7700 6.398  88.0  2.5182  24 666    20.2 374.56
## 362  3.83684   0.0 18.10    0 0.7700 6.251  91.1  2.2955  24 666    20.2 350.65
## 363  3.67822   0.0 18.10    0 0.7700 5.362  96.2  2.1036  24 666    20.2 380.79
## 364  4.22239   0.0 18.10    1 0.7700 5.803  89.0  1.9047  24 666    20.2 353.04
## 365  3.47428   0.0 18.10    1 0.7180 8.780  82.9  1.9047  24 666    20.2 354.55
## 366  4.55587   0.0 18.10    0 0.7180 3.561  87.9  1.6132  24 666    20.2 354.70
## 367  3.69695   0.0 18.10    0 0.7180 4.963  91.4  1.7523  24 666    20.2 316.03
## 368 13.52220   0.0 18.10    0 0.6310 3.863 100.0  1.5106  24 666    20.2 131.42
## 369  4.89822   0.0 18.10    0 0.6310 4.970 100.0  1.3325  24 666    20.2 375.52
## 370  5.66998   0.0 18.10    1 0.6310 6.683  96.8  1.3567  24 666    20.2 375.33
## 371  6.53876   0.0 18.10    1 0.6310 7.016  97.5  1.2024  24 666    20.2 392.05
## 372  9.23230   0.0 18.10    0 0.6310 6.216 100.0  1.1691  24 666    20.2 366.15
## 373  8.26725   0.0 18.10    1 0.6680 5.875  89.6  1.1296  24 666    20.2 347.88
## 374 11.10810   0.0 18.10    0 0.6680 4.906 100.0  1.1742  24 666    20.2 396.90
## 375 18.49820   0.0 18.10    0 0.6680 4.138 100.0  1.1370  24 666    20.2 396.90
## 376 19.60910   0.0 18.10    0 0.6710 7.313  97.9  1.3163  24 666    20.2 396.90
## 377 15.28800   0.0 18.10    0 0.6710 6.649  93.3  1.3449  24 666    20.2 363.02
## 378  9.82349   0.0 18.10    0 0.6710 6.794  98.8  1.3580  24 666    20.2 396.90
## 379 23.64820   0.0 18.10    0 0.6710 6.380  96.2  1.3861  24 666    20.2 396.90
## 380 17.86670   0.0 18.10    0 0.6710 6.223 100.0  1.3861  24 666    20.2 393.74
## 381 88.97620   0.0 18.10    0 0.6710 6.968  91.9  1.4165  24 666    20.2 396.90
## 382 15.87440   0.0 18.10    0 0.6710 6.545  99.1  1.5192  24 666    20.2 396.90
## 383  9.18702   0.0 18.10    0 0.7000 5.536 100.0  1.5804  24 666    20.2 396.90
## 384  7.99248   0.0 18.10    0 0.7000 5.520 100.0  1.5331  24 666    20.2 396.90
## 385 20.08490   0.0 18.10    0 0.7000 4.368  91.2  1.4395  24 666    20.2 285.83
## 386 16.81180   0.0 18.10    0 0.7000 5.277  98.1  1.4261  24 666    20.2 396.90
## 387 24.39380   0.0 18.10    0 0.7000 4.652 100.0  1.4672  24 666    20.2 396.90
## 388 22.59710   0.0 18.10    0 0.7000 5.000  89.5  1.5184  24 666    20.2 396.90
## 389 14.33370   0.0 18.10    0 0.7000 4.880 100.0  1.5895  24 666    20.2 372.92
## 390  8.15174   0.0 18.10    0 0.7000 5.390  98.9  1.7281  24 666    20.2 396.90
## 391  6.96215   0.0 18.10    0 0.7000 5.713  97.0  1.9265  24 666    20.2 394.43
## 392  5.29305   0.0 18.10    0 0.7000 6.051  82.5  2.1678  24 666    20.2 378.38
## 393 11.57790   0.0 18.10    0 0.7000 5.036  97.0  1.7700  24 666    20.2 396.90
## 394  8.64476   0.0 18.10    0 0.6930 6.193  92.6  1.7912  24 666    20.2 396.90
## 395 13.35980   0.0 18.10    0 0.6930 5.887  94.7  1.7821  24 666    20.2 396.90
## 396  8.71675   0.0 18.10    0 0.6930 6.471  98.8  1.7257  24 666    20.2 391.98
## 397  5.87205   0.0 18.10    0 0.6930 6.405  96.0  1.6768  24 666    20.2 396.90
## 398  7.67202   0.0 18.10    0 0.6930 5.747  98.9  1.6334  24 666    20.2 393.10
## 399 38.35180   0.0 18.10    0 0.6930 5.453 100.0  1.4896  24 666    20.2 396.90
## 400  9.91655   0.0 18.10    0 0.6930 5.852  77.8  1.5004  24 666    20.2 338.16
## 401 25.04610   0.0 18.10    0 0.6930 5.987 100.0  1.5888  24 666    20.2 396.90
## 402 14.23620   0.0 18.10    0 0.6930 6.343 100.0  1.5741  24 666    20.2 396.90
## 403  9.59571   0.0 18.10    0 0.6930 6.404 100.0  1.6390  24 666    20.2 376.11
## 404 24.80170   0.0 18.10    0 0.6930 5.349  96.0  1.7028  24 666    20.2 396.90
## 405 41.52920   0.0 18.10    0 0.6930 5.531  85.4  1.6074  24 666    20.2 329.46
## 406 67.92080   0.0 18.10    0 0.6930 5.683 100.0  1.4254  24 666    20.2 384.97
## 407 20.71620   0.0 18.10    0 0.6590 4.138 100.0  1.1781  24 666    20.2 370.22
## 408 11.95110   0.0 18.10    0 0.6590 5.608 100.0  1.2852  24 666    20.2 332.09
## 409  7.40389   0.0 18.10    0 0.5970 5.617  97.9  1.4547  24 666    20.2 314.64
## 410 14.43830   0.0 18.10    0 0.5970 6.852 100.0  1.4655  24 666    20.2 179.36
## 411 51.13580   0.0 18.10    0 0.5970 5.757 100.0  1.4130  24 666    20.2   2.60
## 412 14.05070   0.0 18.10    0 0.5970 6.657 100.0  1.5275  24 666    20.2  35.05
## 413 18.81100   0.0 18.10    0 0.5970 4.628 100.0  1.5539  24 666    20.2  28.79
## 414 28.65580   0.0 18.10    0 0.5970 5.155 100.0  1.5894  24 666    20.2 210.97
## 415 45.74610   0.0 18.10    0 0.6930 4.519 100.0  1.6582  24 666    20.2  88.27
## 416 18.08460   0.0 18.10    0 0.6790 6.434 100.0  1.8347  24 666    20.2  27.25
## 417 10.83420   0.0 18.10    0 0.6790 6.782  90.8  1.8195  24 666    20.2  21.57
## 418 25.94060   0.0 18.10    0 0.6790 5.304  89.1  1.6475  24 666    20.2 127.36
## 419 73.53410   0.0 18.10    0 0.6790 5.957 100.0  1.8026  24 666    20.2  16.45
## 420 11.81230   0.0 18.10    0 0.7180 6.824  76.5  1.7940  24 666    20.2  48.45
## 421 11.08740   0.0 18.10    0 0.7180 6.411 100.0  1.8589  24 666    20.2 318.75
## 422  7.02259   0.0 18.10    0 0.7180 6.006  95.3  1.8746  24 666    20.2 319.98
## 423 12.04820   0.0 18.10    0 0.6140 5.648  87.6  1.9512  24 666    20.2 291.55
## 424  7.05042   0.0 18.10    0 0.6140 6.103  85.1  2.0218  24 666    20.2   2.52
## 425  8.79212   0.0 18.10    0 0.5840 5.565  70.6  2.0635  24 666    20.2   3.65
## 426 15.86030   0.0 18.10    0 0.6790 5.896  95.4  1.9096  24 666    20.2   7.68
## 427 12.24720   0.0 18.10    0 0.5840 5.837  59.7  1.9976  24 666    20.2  24.65
## 428 37.66190   0.0 18.10    0 0.6790 6.202  78.7  1.8629  24 666    20.2  18.82
## 429  7.36711   0.0 18.10    0 0.6790 6.193  78.1  1.9356  24 666    20.2  96.73
## 430  9.33889   0.0 18.10    0 0.6790 6.380  95.6  1.9682  24 666    20.2  60.72
## 431  8.49213   0.0 18.10    0 0.5840 6.348  86.1  2.0527  24 666    20.2  83.45
## 432 10.06230   0.0 18.10    0 0.5840 6.833  94.3  2.0882  24 666    20.2  81.33
## 433  6.44405   0.0 18.10    0 0.5840 6.425  74.8  2.2004  24 666    20.2  97.95
## 434  5.58107   0.0 18.10    0 0.7130 6.436  87.9  2.3158  24 666    20.2 100.19
## 435 13.91340   0.0 18.10    0 0.7130 6.208  95.0  2.2222  24 666    20.2 100.63
## 436 11.16040   0.0 18.10    0 0.7400 6.629  94.6  2.1247  24 666    20.2 109.85
## 437 14.42080   0.0 18.10    0 0.7400 6.461  93.3  2.0026  24 666    20.2  27.49
## 438 15.17720   0.0 18.10    0 0.7400 6.152 100.0  1.9142  24 666    20.2   9.32
## 439 13.67810   0.0 18.10    0 0.7400 5.935  87.9  1.8206  24 666    20.2  68.95
## 440  9.39063   0.0 18.10    0 0.7400 5.627  93.9  1.8172  24 666    20.2 396.90
## 441 22.05110   0.0 18.10    0 0.7400 5.818  92.4  1.8662  24 666    20.2 391.45
## 442  9.72418   0.0 18.10    0 0.7400 6.406  97.2  2.0651  24 666    20.2 385.96
## 443  5.66637   0.0 18.10    0 0.7400 6.219 100.0  2.0048  24 666    20.2 395.69
## 444  9.96654   0.0 18.10    0 0.7400 6.485 100.0  1.9784  24 666    20.2 386.73
## 445 12.80230   0.0 18.10    0 0.7400 5.854  96.6  1.8956  24 666    20.2 240.52
## 446 10.67180   0.0 18.10    0 0.7400 6.459  94.8  1.9879  24 666    20.2  43.06
## 447  6.28807   0.0 18.10    0 0.7400 6.341  96.4  2.0720  24 666    20.2 318.01
## 448  9.92485   0.0 18.10    0 0.7400 6.251  96.6  2.1980  24 666    20.2 388.52
## 449  9.32909   0.0 18.10    0 0.7130 6.185  98.7  2.2616  24 666    20.2 396.90
## 450  7.52601   0.0 18.10    0 0.7130 6.417  98.3  2.1850  24 666    20.2 304.21
## 451  6.71772   0.0 18.10    0 0.7130 6.749  92.6  2.3236  24 666    20.2   0.32
## 452  5.44114   0.0 18.10    0 0.7130 6.655  98.2  2.3552  24 666    20.2 355.29
## 453  5.09017   0.0 18.10    0 0.7130 6.297  91.8  2.3682  24 666    20.2 385.09
## 454  8.24809   0.0 18.10    0 0.7130 7.393  99.3  2.4527  24 666    20.2 375.87
## 455  9.51363   0.0 18.10    0 0.7130 6.728  94.1  2.4961  24 666    20.2   6.68
## 456  4.75237   0.0 18.10    0 0.7130 6.525  86.5  2.4358  24 666    20.2  50.92
## 457  4.66883   0.0 18.10    0 0.7130 5.976  87.9  2.5806  24 666    20.2  10.48
## 458  8.20058   0.0 18.10    0 0.7130 5.936  80.3  2.7792  24 666    20.2   3.50
## 459  7.75223   0.0 18.10    0 0.7130 6.301  83.7  2.7831  24 666    20.2 272.21
## 460  6.80117   0.0 18.10    0 0.7130 6.081  84.4  2.7175  24 666    20.2 396.90
## 461  4.81213   0.0 18.10    0 0.7130 6.701  90.0  2.5975  24 666    20.2 255.23
## 462  3.69311   0.0 18.10    0 0.7130 6.376  88.4  2.5671  24 666    20.2 391.43
## 463  6.65492   0.0 18.10    0 0.7130 6.317  83.0  2.7344  24 666    20.2 396.90
## 464  5.82115   0.0 18.10    0 0.7130 6.513  89.9  2.8016  24 666    20.2 393.82
## 465  7.83932   0.0 18.10    0 0.6550 6.209  65.4  2.9634  24 666    20.2 396.90
## 466  3.16360   0.0 18.10    0 0.6550 5.759  48.2  3.0665  24 666    20.2 334.40
## 467  3.77498   0.0 18.10    0 0.6550 5.952  84.7  2.8715  24 666    20.2  22.01
## 468  4.42228   0.0 18.10    0 0.5840 6.003  94.5  2.5403  24 666    20.2 331.29
## 469 15.57570   0.0 18.10    0 0.5800 5.926  71.0  2.9084  24 666    20.2 368.74
## 470 13.07510   0.0 18.10    0 0.5800 5.713  56.7  2.8237  24 666    20.2 396.90
## 471  4.34879   0.0 18.10    0 0.5800 6.167  84.0  3.0334  24 666    20.2 396.90
## 472  4.03841   0.0 18.10    0 0.5320 6.229  90.7  3.0993  24 666    20.2 395.33
## 473  3.56868   0.0 18.10    0 0.5800 6.437  75.0  2.8965  24 666    20.2 393.37
## 474  4.64689   0.0 18.10    0 0.6140 6.980  67.6  2.5329  24 666    20.2 374.68
## 475  8.05579   0.0 18.10    0 0.5840 5.427  95.4  2.4298  24 666    20.2 352.58
## 476  6.39312   0.0 18.10    0 0.5840 6.162  97.4  2.2060  24 666    20.2 302.76
## 477  4.87141   0.0 18.10    0 0.6140 6.484  93.6  2.3053  24 666    20.2 396.21
## 478 15.02340   0.0 18.10    0 0.6140 5.304  97.3  2.1007  24 666    20.2 349.48
## 479 10.23300   0.0 18.10    0 0.6140 6.185  96.7  2.1705  24 666    20.2 379.70
## 480 14.33370   0.0 18.10    0 0.6140 6.229  88.0  1.9512  24 666    20.2 383.32
## 481  5.82401   0.0 18.10    0 0.5320 6.242  64.7  3.4242  24 666    20.2 396.90
## 482  5.70818   0.0 18.10    0 0.5320 6.750  74.9  3.3317  24 666    20.2 393.07
## 483  5.73116   0.0 18.10    0 0.5320 7.061  77.0  3.4106  24 666    20.2 395.28
## 484  2.81838   0.0 18.10    0 0.5320 5.762  40.3  4.0983  24 666    20.2 392.92
## 485  2.37857   0.0 18.10    0 0.5830 5.871  41.9  3.7240  24 666    20.2 370.73
## 486  3.67367   0.0 18.10    0 0.5830 6.312  51.9  3.9917  24 666    20.2 388.62
## 487  5.69175   0.0 18.10    0 0.5830 6.114  79.8  3.5459  24 666    20.2 392.68
## 488  4.83567   0.0 18.10    0 0.5830 5.905  53.2  3.1523  24 666    20.2 388.22
## 489  0.15086   0.0 27.74    0 0.6090 5.454  92.7  1.8209   4 711    20.1 395.09
## 490  0.18337   0.0 27.74    0 0.6090 5.414  98.3  1.7554   4 711    20.1 344.05
## 491  0.20746   0.0 27.74    0 0.6090 5.093  98.0  1.8226   4 711    20.1 318.43
## 492  0.10574   0.0 27.74    0 0.6090 5.983  98.8  1.8681   4 711    20.1 390.11
## 493  0.11132   0.0 27.74    0 0.6090 5.983  83.5  2.1099   4 711    20.1 396.90
## 494  0.17331   0.0  9.69    0 0.5850 5.707  54.0  2.3817   6 391    19.2 396.90
## 495  0.27957   0.0  9.69    0 0.5850 5.926  42.6  2.3817   6 391    19.2 396.90
## 496  0.17899   0.0  9.69    0 0.5850 5.670  28.8  2.7986   6 391    19.2 393.29
## 497  0.28960   0.0  9.69    0 0.5850 5.390  72.9  2.7986   6 391    19.2 396.90
## 498  0.26838   0.0  9.69    0 0.5850 5.794  70.6  2.8927   6 391    19.2 396.90
## 499  0.23912   0.0  9.69    0 0.5850 6.019  65.3  2.4091   6 391    19.2 396.90
## 500  0.17783   0.0  9.69    0 0.5850 5.569  73.5  2.3999   6 391    19.2 395.77
## 501  0.22438   0.0  9.69    0 0.5850 6.027  79.7  2.4982   6 391    19.2 396.90
## 502  0.06263   0.0 11.93    0 0.5730 6.593  69.1  2.4786   1 273    21.0 391.99
## 503  0.04527   0.0 11.93    0 0.5730 6.120  76.7  2.2875   1 273    21.0 396.90
## 504  0.06076   0.0 11.93    0 0.5730 6.976  91.0  2.1675   1 273    21.0 396.90
## 505  0.10959   0.0 11.93    0 0.5730 6.794  89.3  2.3889   1 273    21.0 393.45
## 506  0.04741   0.0 11.93    0 0.5730 6.030  80.8  2.5050   1 273    21.0 396.90
##     lstat medv
## 1    4.98 24.0
## 2    9.14 21.6
## 3    4.03 34.7
## 4    2.94 33.4
## 5    5.33 36.2
## 6    5.21 28.7
## 7   12.43 22.9
## 8   19.15 27.1
## 9   29.93 16.5
## 10  17.10 18.9
## 11  20.45 15.0
## 12  13.27 18.9
## 13  15.71 21.7
## 14   8.26 20.4
## 15  10.26 18.2
## 16   8.47 19.9
## 17   6.58 23.1
## 18  14.67 17.5
## 19  11.69 20.2
## 20  11.28 18.2
## 21  21.02 13.6
## 22  13.83 19.6
## 23  18.72 15.2
## 24  19.88 14.5
## 25  16.30 15.6
## 26  16.51 13.9
## 27  14.81 16.6
## 28  17.28 14.8
## 29  12.80 18.4
## 30  11.98 21.0
## 31  22.60 12.7
## 32  13.04 14.5
## 33  27.71 13.2
## 34  18.35 13.1
## 35  20.34 13.5
## 36   9.68 18.9
## 37  11.41 20.0
## 38   8.77 21.0
## 39  10.13 24.7
## 40   4.32 30.8
## 41   1.98 34.9
## 42   4.84 26.6
## 43   5.81 25.3
## 44   7.44 24.7
## 45   9.55 21.2
## 46  10.21 19.3
## 47  14.15 20.0
## 48  18.80 16.6
## 49  30.81 14.4
## 50  16.20 19.4
## 51  13.45 19.7
## 52   9.43 20.5
## 53   5.28 25.0
## 54   8.43 23.4
## 55  14.80 18.9
## 56   4.81 35.4
## 57   5.77 24.7
## 58   3.95 31.6
## 59   6.86 23.3
## 60   9.22 19.6
## 61  13.15 18.7
## 62  14.44 16.0
## 63   6.73 22.2
## 64   9.50 25.0
## 65   8.05 33.0
## 66   4.67 23.5
## 67  10.24 19.4
## 68   8.10 22.0
## 69  13.09 17.4
## 70   8.79 20.9
## 71   6.72 24.2
## 72   9.88 21.7
## 73   5.52 22.8
## 74   7.54 23.4
## 75   6.78 24.1
## 76   8.94 21.4
## 77  11.97 20.0
## 78  10.27 20.8
## 79  12.34 21.2
## 80   9.10 20.3
## 81   5.29 28.0
## 82   7.22 23.9
## 83   6.72 24.8
## 84   7.51 22.9
## 85   9.62 23.9
## 86   6.53 26.6
## 87  12.86 22.5
## 88   8.44 22.2
## 89   5.50 23.6
## 90   5.70 28.7
## 91   8.81 22.6
## 92   8.20 22.0
## 93   8.16 22.9
## 94   6.21 25.0
## 95  10.59 20.6
## 96   6.65 28.4
## 97  11.34 21.4
## 98   4.21 38.7
## 99   3.57 43.8
## 100  6.19 33.2
## 101  9.42 27.5
## 102  7.67 26.5
## 103 10.63 18.6
## 104 13.44 19.3
## 105 12.33 20.1
## 106 16.47 19.5
## 107 18.66 19.5
## 108 14.09 20.4
## 109 12.27 19.8
## 110 15.55 19.4
## 111 13.00 21.7
## 112 10.16 22.8
## 113 16.21 18.8
## 114 17.09 18.7
## 115 10.45 18.5
## 116 15.76 18.3
## 117 12.04 21.2
## 118 10.30 19.2
## 119 15.37 20.4
## 120 13.61 19.3
## 121 14.37 22.0
## 122 14.27 20.3
## 123 17.93 20.5
## 124 25.41 17.3
## 125 17.58 18.8
## 126 14.81 21.4
## 127 27.26 15.7
## 128 17.19 16.2
## 129 15.39 18.0
## 130 18.34 14.3
## 131 12.60 19.2
## 132 12.26 19.6
## 133 11.12 23.0
## 134 15.03 18.4
## 135 17.31 15.6
## 136 16.96 18.1
## 137 16.90 17.4
## 138 14.59 17.1
## 139 21.32 13.3
## 140 18.46 17.8
## 141 24.16 14.0
## 142 34.41 14.4
## 143 26.82 13.4
## 144 26.42 15.6
## 145 29.29 11.8
## 146 27.80 13.8
## 147 16.65 15.6
## 148 29.53 14.6
## 149 28.32 17.8
## 150 21.45 15.4
## 151 14.10 21.5
## 152 13.28 19.6
## 153 12.12 15.3
## 154 15.79 19.4
## 155 15.12 17.0
## 156 15.02 15.6
## 157 16.14 13.1
## 158  4.59 41.3
## 159  6.43 24.3
## 160  7.39 23.3
## 161  5.50 27.0
## 162  1.73 50.0
## 163  1.92 50.0
## 164  3.32 50.0
## 165 11.64 22.7
## 166  9.81 25.0
## 167  3.70 50.0
## 168 12.14 23.8
## 169 11.10 23.8
## 170 11.32 22.3
## 171 14.43 17.4
## 172 12.03 19.1
## 173 14.69 23.1
## 174  9.04 23.6
## 175  9.64 22.6
## 176  5.33 29.4
## 177 10.11 23.2
## 178  6.29 24.6
## 179  6.92 29.9
## 180  5.04 37.2
## 181  7.56 39.8
## 182  9.45 36.2
## 183  4.82 37.9
## 184  5.68 32.5
## 185 13.98 26.4
## 186 13.15 29.6
## 187  4.45 50.0
## 188  6.68 32.0
## 189  4.56 29.8
## 190  5.39 34.9
## 191  5.10 37.0
## 192  4.69 30.5
## 193  2.87 36.4
## 194  5.03 31.1
## 195  4.38 29.1
## 196  2.97 50.0
## 197  4.08 33.3
## 198  8.61 30.3
## 199  6.62 34.6
## 200  4.56 34.9
## 201  4.45 32.9
## 202  7.43 24.1
## 203  3.11 42.3
## 204  3.81 48.5
## 205  2.88 50.0
## 206 10.87 22.6
## 207 10.97 24.4
## 208 18.06 22.5
## 209 14.66 24.4
## 210 23.09 20.0
## 211 17.27 21.7
## 212 23.98 19.3
## 213 16.03 22.4
## 214  9.38 28.1
## 215 29.55 23.7
## 216  9.47 25.0
## 217 13.51 23.3
## 218  9.69 28.7
## 219 17.92 21.5
## 220 10.50 23.0
## 221  9.71 26.7
## 222 21.46 21.7
## 223  9.93 27.5
## 224  7.60 30.1
## 225  4.14 44.8
## 226  4.63 50.0
## 227  3.13 37.6
## 228  6.36 31.6
## 229  3.92 46.7
## 230  3.76 31.5
## 231 11.65 24.3
## 232  5.25 31.7
## 233  2.47 41.7
## 234  3.95 48.3
## 235  8.05 29.0
## 236 10.88 24.0
## 237  9.54 25.1
## 238  4.73 31.5
## 239  6.36 23.7
## 240  7.37 23.3
## 241 11.38 22.0
## 242 12.40 20.1
## 243 11.22 22.2
## 244  5.19 23.7
## 245 12.50 17.6
## 246 18.46 18.5
## 247  9.16 24.3
## 248 10.15 20.5
## 249  9.52 24.5
## 250  6.56 26.2
## 251  5.90 24.4
## 252  3.59 24.8
## 253  3.53 29.6
## 254  3.54 42.8
## 255  6.57 21.9
## 256  9.25 20.9
## 257  3.11 44.0
## 258  5.12 50.0
## 259  7.79 36.0
## 260  6.90 30.1
## 261  9.59 33.8
## 262  7.26 43.1
## 263  5.91 48.8
## 264 11.25 31.0
## 265  8.10 36.5
## 266 10.45 22.8
## 267 14.79 30.7
## 268  7.44 50.0
## 269  3.16 43.5
## 270 13.65 20.7
## 271 13.00 21.1
## 272  6.59 25.2
## 273  7.73 24.4
## 274  6.58 35.2
## 275  3.53 32.4
## 276  2.98 32.0
## 277  6.05 33.2
## 278  4.16 33.1
## 279  7.19 29.1
## 280  4.85 35.1
## 281  3.76 45.4
## 282  4.59 35.4
## 283  3.01 46.0
## 284  3.16 50.0
## 285  7.85 32.2
## 286  8.23 22.0
## 287 12.93 20.1
## 288  7.14 23.2
## 289  7.60 22.3
## 290  9.51 24.8
## 291  3.33 28.5
## 292  3.56 37.3
## 293  4.70 27.9
## 294  8.58 23.9
## 295 10.40 21.7
## 296  6.27 28.6
## 297  7.39 27.1
## 298 15.84 20.3
## 299  4.97 22.5
## 300  4.74 29.0
## 301  6.07 24.8
## 302  9.50 22.0
## 303  8.67 26.4
## 304  4.86 33.1
## 305  6.93 36.1
## 306  8.93 28.4
## 307  6.47 33.4
## 308  7.53 28.2
## 309  4.54 22.8
## 310  9.97 20.3
## 311 12.64 16.1
## 312  5.98 22.1
## 313 11.72 19.4
## 314  7.90 21.6
## 315  9.28 23.8
## 316 11.50 16.2
## 317 18.33 17.8
## 318 15.94 19.8
## 319 10.36 23.1
## 320 12.73 21.0
## 321  7.20 23.8
## 322  6.87 23.1
## 323  7.70 20.4
## 324 11.74 18.5
## 325  6.12 25.0
## 326  5.08 24.6
## 327  6.15 23.0
## 328 12.79 22.2
## 329  9.97 19.3
## 330  7.34 22.6
## 331  9.09 19.8
## 332 12.43 17.1
## 333  7.83 19.4
## 334  5.68 22.2
## 335  6.75 20.7
## 336  8.01 21.1
## 337  9.80 19.5
## 338 10.56 18.5
## 339  8.51 20.6
## 340  9.74 19.0
## 341  9.29 18.7
## 342  5.49 32.7
## 343  8.65 16.5
## 344  7.18 23.9
## 345  4.61 31.2
## 346 10.53 17.5
## 347 12.67 17.2
## 348  6.36 23.1
## 349  5.99 24.5
## 350  5.89 26.6
## 351  5.98 22.9
## 352  5.49 24.1
## 353  7.79 18.6
## 354  4.50 30.1
## 355  8.05 18.2
## 356  5.57 20.6
## 357 17.60 17.8
## 358 13.27 21.7
## 359 11.48 22.7
## 360 12.67 22.6
## 361  7.79 25.0
## 362 14.19 19.9
## 363 10.19 20.8
## 364 14.64 16.8
## 365  5.29 21.9
## 366  7.12 27.5
## 367 14.00 21.9
## 368 13.33 23.1
## 369  3.26 50.0
## 370  3.73 50.0
## 371  2.96 50.0
## 372  9.53 50.0
## 373  8.88 50.0
## 374 34.77 13.8
## 375 37.97 13.8
## 376 13.44 15.0
## 377 23.24 13.9
## 378 21.24 13.3
## 379 23.69 13.1
## 380 21.78 10.2
## 381 17.21 10.4
## 382 21.08 10.9
## 383 23.60 11.3
## 384 24.56 12.3
## 385 30.63  8.8
## 386 30.81  7.2
## 387 28.28 10.5
## 388 31.99  7.4
## 389 30.62 10.2
## 390 20.85 11.5
## 391 17.11 15.1
## 392 18.76 23.2
## 393 25.68  9.7
## 394 15.17 13.8
## 395 16.35 12.7
## 396 17.12 13.1
## 397 19.37 12.5
## 398 19.92  8.5
## 399 30.59  5.0
## 400 29.97  6.3
## 401 26.77  5.6
## 402 20.32  7.2
## 403 20.31 12.1
## 404 19.77  8.3
## 405 27.38  8.5
## 406 22.98  5.0
## 407 23.34 11.9
## 408 12.13 27.9
## 409 26.40 17.2
## 410 19.78 27.5
## 411 10.11 15.0
## 412 21.22 17.2
## 413 34.37 17.9
## 414 20.08 16.3
## 415 36.98  7.0
## 416 29.05  7.2
## 417 25.79  7.5
## 418 26.64 10.4
## 419 20.62  8.8
## 420 22.74  8.4
## 421 15.02 16.7
## 422 15.70 14.2
## 423 14.10 20.8
## 424 23.29 13.4
## 425 17.16 11.7
## 426 24.39  8.3
## 427 15.69 10.2
## 428 14.52 10.9
## 429 21.52 11.0
## 430 24.08  9.5
## 431 17.64 14.5
## 432 19.69 14.1
## 433 12.03 16.1
## 434 16.22 14.3
## 435 15.17 11.7
## 436 23.27 13.4
## 437 18.05  9.6
## 438 26.45  8.7
## 439 34.02  8.4
## 440 22.88 12.8
## 441 22.11 10.5
## 442 19.52 17.1
## 443 16.59 18.4
## 444 18.85 15.4
## 445 23.79 10.8
## 446 23.98 11.8
## 447 17.79 14.9
## 448 16.44 12.6
## 449 18.13 14.1
## 450 19.31 13.0
## 451 17.44 13.4
## 452 17.73 15.2
## 453 17.27 16.1
## 454 16.74 17.8
## 455 18.71 14.9
## 456 18.13 14.1
## 457 19.01 12.7
## 458 16.94 13.5
## 459 16.23 14.9
## 460 14.70 20.0
## 461 16.42 16.4
## 462 14.65 17.7
## 463 13.99 19.5
## 464 10.29 20.2
## 465 13.22 21.4
## 466 14.13 19.9
## 467 17.15 19.0
## 468 21.32 19.1
## 469 18.13 19.1
## 470 14.76 20.1
## 471 16.29 19.9
## 472 12.87 19.6
## 473 14.36 23.2
## 474 11.66 29.8
## 475 18.14 13.8
## 476 24.10 13.3
## 477 18.68 16.7
## 478 24.91 12.0
## 479 18.03 14.6
## 480 13.11 21.4
## 481 10.74 23.0
## 482  7.74 23.7
## 483  7.01 25.0
## 484 10.42 21.8
## 485 13.34 20.6
## 486 10.58 21.2
## 487 14.98 19.1
## 488 11.45 20.6
## 489 18.06 15.2
## 490 23.97  7.0
## 491 29.68  8.1
## 492 18.07 13.6
## 493 13.35 20.1
## 494 12.01 21.8
## 495 13.59 24.5
## 496 17.60 23.1
## 497 21.14 19.7
## 498 14.10 18.3
## 499 12.92 21.2
## 500 15.10 17.5
## 501 14.33 16.8
## 502  9.67 22.4
## 503  9.08 20.6
## 504  5.64 23.9
## 505  6.48 22.0
## 506  7.88 11.9
set.seed(1)
train = sample(1:nrow(Boston), nrow(Boston)/2)
boston.test=Boston[-train ,"medv"]
mtry_ <-  6:15
ntree_ <- c(25, 50, 100, 150, 500)
vect_error <- vector()
vect_cate <- vector()
k = 1
for (i in mtry_){
  for (j in ntree_){
    rf.boston =randomForest(medv∼.,data=Boston ,subset =train , 
                            mtry=i, importance =TRUE, ntree = j)
    yhat.rf = predict(rf.boston , newdata=Boston[-train,])
    vect_error[k] = mean((yhat.rf -boston.test)^2)
    vect_cate[k] = str_c(i, j, sep='-')
    k = k + 1
    }
    
}
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range
grup = c(rep('g1', 5), rep('g2', 5), rep('g3', 5), rep('g4', 5), rep('g5', 5), rep('g6', 5), 
         rep('g7', 5), rep('g8', 5), rep('g9', 5), rep('g10', 5)) 
df_error = data.frame(vect_cate, vect_error, rep(1:5, 10), grup)
names(df_error) = c('tree_mtry', 'y', 'eje_x', 'grup')
ggplot(df_error, aes(x = eje_x, y=y, group=grup, colour=grup)) +
  geom_line()  + 
  geom_point( size=2, shape=21, fill="white") + 
  theme_minimal()

Punto 8

A

Split the data set into a training set and a test set

library(ISLR)
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
set.seed(123)
train <- createDataPartition(y = Carseats$Sales, p = 0.8, list=FALSE, times = 1)
dt_train <- Carseats[train, ]
df_test <- Carseats[-train, ]

B

Fit a regression tree to the training set. Plot the tree, and interpret the results. What test MSE do you obtain?

library(MASS)
library(tree)
## Registered S3 method overwritten by 'tree':
##   method     from
##   print.tree cli
tree.sales <- tree(Sales∼.,dt_train)
summary(tree.sales)
## 
## Regression tree:
## tree(formula = Sales ~ ., data = dt_train)
## Variables actually used in tree construction:
## [1] "ShelveLoc"   "Price"       "Income"      "CompPrice"   "Advertising"
## [6] "Age"        
## Number of terminal nodes:  18 
## Residual mean deviance:  2.523 = 764.4 / 303 
## Distribution of residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -3.90000 -1.06200 -0.01417  0.00000  1.04800  4.22600
plot(tree.sales)
text(tree.sales ,pretty =0, cex=0.6)

### C

Use cross-validation in order to determine the optimal level of tree complexity. Does pruning the tree improve the test MSE?

cv.sales <- cv.tree(tree.sales)
plot(cv.sales$size ,cv.sales$dev ,type='b')

### D

Use the bagging approach in order to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important.

set.seed (1)
bag.sales =randomForest(Sales∼.,data=dt_train, mtry=13, importance =TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range
bag.sales
## 
## Call:
##  randomForest(formula = Sales ~ ., data = dt_train, mtry = 13,      importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 10
## 
##           Mean of squared residuals: 2.45914
##                     % Var explained: 68.9
yhat.bag = predict(bag.sales, newdata=df_test[,-1])

cat('MSE TEST \n')
## MSE TEST
sum((yhat.bag-df_test['Sales'])^2)/length(yhat.bag)
## [1] 2.246559
importance(bag.sales)
##               %IncMSE IncNodePurity
## CompPrice   33.522788    256.384174
## Income       7.762270    122.801187
## Advertising 21.076091    164.771460
## Population  -1.409105     82.251739
## Price       69.404219    732.183579
## ShelveLoc   81.003894    793.424922
## Age         25.304258    242.349739
## Education    3.707298     73.960587
## Urban        1.087029     10.579006
## US           2.860203      9.007516

###E

Use random forests to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables aremost important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained.

rf.sale =randomForest(Sales∼.,data=dt_train, mtry=6, importance =TRUE)
yhat.rf = predict (rf.sale ,newdata=df_test[,-1])

cat('MSE TEST \n')
## MSE TEST
sum((yhat.rf-df_test['Sales'])^2)/length(yhat.rf)
## [1] 2.293773
importance(rf.sale)
##               %IncMSE IncNodePurity
## CompPrice   24.904328     241.40874
## Income       7.464867     147.36056
## Advertising 18.623228     189.05533
## Population  -1.691185      98.50981
## Price       54.701286     680.50888
## ShelveLoc   66.659911     736.87907
## Age         23.716570     259.61046
## Education    3.398162      82.70636
## Urban       -2.500238      13.53313
## US           4.075596      19.04079

Punto 9

This problem involves the OJ data set which is part of the ISLR package.

A

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

library(ISLR)
set.seed(1236)
ind_train <- sample(1:1070, 800)
train_OJ <- OJ[ind_train,]
test_OJ <- OJ[-ind_train,]

B

Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?

tree.purch <- tree(Purchase∼.,train_OJ)
summary(tree.purch)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = train_OJ)
## Variables actually used in tree construction:
## [1] "LoyalCH"   "PriceDiff"
## Number of terminal nodes:  9 
## Residual mean deviance:  0.7273 = 575.3 / 791 
## Misclassification error rate: 0.1575 = 126 / 800

C

Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.

tree.purch
## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 800 1076.000 CH ( 0.60125 0.39875 )  
##    2) LoyalCH < 0.5036 355  414.400 MM ( 0.27042 0.72958 )  
##      4) LoyalCH < 0.279374 174  124.100 MM ( 0.11494 0.88506 )  
##        8) LoyalCH < 0.0356415 62   10.240 MM ( 0.01613 0.98387 ) *
##        9) LoyalCH > 0.0356415 112  102.000 MM ( 0.16964 0.83036 ) *
##      5) LoyalCH > 0.279374 181  246.300 MM ( 0.41989 0.58011 )  
##       10) PriceDiff < 0.05 72   70.930 MM ( 0.19444 0.80556 ) *
##       11) PriceDiff > 0.05 109  149.000 CH ( 0.56881 0.43119 ) *
##    3) LoyalCH > 0.5036 445  352.000 CH ( 0.86517 0.13483 )  
##      6) PriceDiff < -0.39 28   35.160 MM ( 0.32143 0.67857 )  
##       12) LoyalCH < 0.974803 21   17.220 MM ( 0.14286 0.85714 ) *
##       13) LoyalCH > 0.974803 7    5.742 CH ( 0.85714 0.14286 ) *
##      7) PriceDiff > -0.39 417  268.000 CH ( 0.90168 0.09832 )  
##       14) LoyalCH < 0.705326 128  134.500 CH ( 0.78125 0.21875 )  
##         28) PriceDiff < 0.265 69   90.350 CH ( 0.63768 0.36232 ) *
##         29) PriceDiff > 0.265 59   23.720 CH ( 0.94915 0.05085 ) *
##       15) LoyalCH > 0.705326 289  106.000 CH ( 0.95502 0.04498 ) *
plot(tree.purch)
text(tree.purch ,pretty =0, cex=0.6)

La primera selección se hace con la varible LoyalCH la cual representa la fidelización de marca del cliente como esta variable tiene valores entre 0,1 dependiendo del valor que tome se hace una clasificación a más MM y CH.

E

Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?

tree.pred=predict(tree.purch, test_OJ[, -1], type="class")
table(tree.pred ,test_OJ[, 1])
##          
## tree.pred  CH  MM
##        CH 154  33
##        MM  18  65
cat("Tasa clasificación correct: \n")
## Tasa clasificación correct:
(154+65)/(154+65+18+33)
## [1] 0.8111111

F

Apply the cv.tree() function to the training set in order to determine the optimal tree size.

cv.purch <- cv.tree(tree.purch, FUN=prune.misclass)
names(cv.purch)
## [1] "size"   "dev"    "k"      "method"
cv.purch
## $size
## [1] 9 6 5 3 2 1
## 
## $dev
## [1] 139 139 146 156 157 319
## 
## $k
## [1]  -Inf   0.0   5.0   7.5  10.0 163.0
## 
## $method
## [1] "misclass"
## 
## attr(,"class")
## [1] "prune"         "tree.sequence"

G

Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.

plot(cv.purch$size, cv.purch$dev ,type="b")

plot(cv.purch$k, cv.purch$dev ,type="b")

H

Which tree size corresponds to the lowest cross-validated classification error rate?

El tamaño del arbol que da una mejor tasa de clasificación en con seis nodos.

I

Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.

prune.purch = prune.misclass(tree.purch, best = 6)
plot(prune.purch )
text(prune.purch ,pretty =0)

J

Compare the training error rates between the pruned and unpruned trees. Which is higher?

tree.pred=predict(prune.purch, test_OJ[, -1], type="class")
table(tree.pred ,test_OJ[, 1])
##          
## tree.pred  CH  MM
##        CH 154  33
##        MM  18  65
cat("Tasa clasificación correct: \n")
## Tasa clasificación correct:
(154+65)/(154+65+18+33)
## [1] 0.8111111

La tasa de clasificación no vario ya que con seis nodos se llega al maxima tasa de calsificación por tanto no hay diferencia entre utilizar nueve o seis nodos.

Punto 10

We now use boosting to predict Salary in the Hitters data set.

A

Remove the observations for whom the salary information is unknown, and then log-transform the salaries.

Carseats <- Carseats[Carseats$Sales>0, ]
Carseats['log_sales'] <- log(Carseats$Sales)

B

Create a training set consisting of the first 200 observations, and a test set consisting of the remaining observations.

train <- Carseats[1:200,-1 ]
test <- Carseats[201:length(Carseats), -1]

C

Perform boosting on the training set with 1,000 trees for a range of values of the shrinkage parameter λ. Produce a plot with different shrinkage values on the x-axis and the corresponding training set MSE on the y-axis.

library (gbm)
## Loaded gbm 2.1.8
lmda <- c(0.1, 0.2, 0.3, 0.4, 0.5)
vect_error <- vector()
k <- 1
for (i in lmda) {
  boost.sale =gbm(log_sales∼.,train, distribution="gaussian",
                    n.trees =1000 , interaction.depth =4, shrinkage = i , verbose =F)
  vect_error[k] <- sum(boost.sale$train.error)/length(train)
  k = k + 1
  
}

plot(lmda, vect_error, type="b")

D

Produce a plot with different shrinkage values on the x-axis and the corresponding test set MSE on the y-axis.

lmda <- c(0.1, 0.2, 0.3, 0.4, 0.5)
vect_error <- vector()
k <- 1
for (i in lmda) {
  boost.sales =gbm(log_sales∼.,train, distribution="gaussian",
                    n.trees =1000 , interaction.depth =4, shrinkage = i , verbose =F)
  yhat.boost=predict (boost.sales ,newdata = test[, -length(test)], n.trees =1000)
  
  vect_error[k] <- sum((yhat.boost-test[, length(test)])^2)/length(test)
  k = k + 1
  
}

plot(lmda, vect_error, type="b")

E

Which variables appear to be the most important predictors in the boosted model?

boost.sales =gbm(log_sales∼.,train, distribution="gaussian",
                 n.trees =1000 , interaction.depth =4, shrinkage = 0.2 , verbose =F)
summary(boost.sales)

##                     var    rel.inf
## Price             Price 33.0269938
## CompPrice     CompPrice 18.0842664
## ShelveLoc     ShelveLoc 11.2072020
## Population   Population 10.4450670
## Income           Income  9.5264010
## Age                 Age  8.5999683
## Advertising Advertising  4.2177659
## Education     Education  3.5275449
## US                   US  0.8530239
## Urban             Urban  0.5117668

F

Now apply bagging to the training set. What is the test set MSE for this approach?

bag.sales <- randomForest(log_sales∼.,data=train, mtry=13, importance =TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range
bag.sales
## 
## Call:
##  randomForest(formula = log_sales ~ ., data = train, mtry = 13,      importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 10
## 
##           Mean of squared residuals: 0.1777817
##                     % Var explained: 46.42

Punto 11

This question uses the Caravan data set.

A

Create a training set consisting of the first 1,000 observations, and a test set consisting of the remaining observations.

df_caravan <- Caravan
df_caravan['Purchase'] <- ifelse(df_caravan['Purchase']=='Yes', 1, 0)
#df_caravan <- df_caravan[, -c(50, 71)]
cr_train <- df_caravan[1:2000, ]
cr_test <- df_caravan[2001:5822, ]

B

Fit a boosting model to the training set with Purchase as the response and the other variables as predictors. Use 1,000 trees, and a shrinkage value of 0.01. Which predictors appear to be the most important?

library(caret)
mod_11 <- gbm(Purchase~., data=cr_train, shrinkage=0.1, distribution = 'bernoulli', n.trees=1000, verbose=F)
#best.iter = gbm.perf(mod_11, method="cv")
summary(mod_11)

##               var    rel.inf
## MINK3045 MINK3045 8.39479272
## MBERMIDD MBERMIDD 7.72016096
## PBYSTAND PBYSTAND 7.51821041
## MOSTYPE   MOSTYPE 5.57126048
## MSKC         MSKC 5.06909281
## MSKB1       MSKB1 5.00428283
## MGODGE     MGODGE 4.35834765
## PPERSAUT PPERSAUT 3.91700847
## MKOOPKLA MKOOPKLA 3.43841388
## MOPLMIDD MOPLMIDD 3.38202295
## MOPLHOOG MOPLHOOG 3.35756831
## PBRAND     PBRAND 3.15489915
## MFGEKIND MFGEKIND 2.57315068
## MBERARBG MBERARBG 2.16176441
## MBERHOOG MBERHOOG 2.15526570
## MINKGEM   MINKGEM 1.77697778
## MBERZELF MBERZELF 1.60730204
## MINK7512 MINK7512 1.58602671
## APERSAUT APERSAUT 1.54079309
## MSKA         MSKA 1.50888225
## PAANHANG PAANHANG 1.40864422
## MOPLLAAG MOPLLAAG 1.36741999
## PLEVEN     PLEVEN 1.30270766
## MFWEKIND MFWEKIND 1.28464116
## MINKM30   MINKM30 1.22017731
## PFIETS     PFIETS 1.20555874
## MINK4575 MINK4575 1.13879147
## MZPART     MZPART 1.13828021
## MAUT1       MAUT1 1.10181701
## PWAPART   PWAPART 1.01119861
## MGODRK     MGODRK 0.96165858
## MSKB2       MSKB2 0.95031002
## MRELGE     MRELGE 0.86058102
## MHHUUR     MHHUUR 0.70049840
## MHKOOP     MHKOOP 0.69119951
## MGODPR     MGODPR 0.67839635
## MAUT0       MAUT0 0.67006776
## MBERARBO MBERARBO 0.60427847
## MGODOV     MGODOV 0.53803527
## MSKD         MSKD 0.51877707
## MAUT2       MAUT2 0.48268161
## ALEVEN     ALEVEN 0.46570641
## MINK123M MINK123M 0.43136697
## MFALLEEN MFALLEEN 0.43131381
## AFIETS     AFIETS 0.42202325
## MRELOV     MRELOV 0.41310740
## MZFONDS   MZFONDS 0.40806918
## MGEMOMV   MGEMOMV 0.37311645
## MGEMLEEF MGEMLEEF 0.29853809
## MAANTHUI MAANTHUI 0.24660372
## MOSHOOFD MOSHOOFD 0.20658351
## PMOTSCO   PMOTSCO 0.20393976
## PGEZONG   PGEZONG 0.11370028
## MRELSA     MRELSA 0.11320187
## ABRAND     ABRAND 0.09334856
## MBERBOER MBERBOER 0.09141374
## PWABEDR   PWABEDR 0.05602326
## PWALAND   PWALAND 0.00000000
## PBESAUT   PBESAUT 0.00000000
## PVRAAUT   PVRAAUT 0.00000000
## PTRACTOR PTRACTOR 0.00000000
## PWERKT     PWERKT 0.00000000
## PBROM       PBROM 0.00000000
## PPERSONG PPERSONG 0.00000000
## PWAOREG   PWAOREG 0.00000000
## PZEILPL   PZEILPL 0.00000000
## PPLEZIER PPLEZIER 0.00000000
## PINBOED   PINBOED 0.00000000
## AWAPART   AWAPART 0.00000000
## AWABEDR   AWABEDR 0.00000000
## AWALAND   AWALAND 0.00000000
## ABESAUT   ABESAUT 0.00000000
## AMOTSCO   AMOTSCO 0.00000000
## AVRAAUT   AVRAAUT 0.00000000
## AAANHANG AAANHANG 0.00000000
## ATRACTOR ATRACTOR 0.00000000
## AWERKT     AWERKT 0.00000000
## ABROM       ABROM 0.00000000
## APERSONG APERSONG 0.00000000
## AGEZONG   AGEZONG 0.00000000
## AWAOREG   AWAOREG 0.00000000
## AZEILPL   AZEILPL 0.00000000
## APLEZIER APLEZIER 0.00000000
## AINBOED   AINBOED 0.00000000
## ABYSTAND ABYSTAND 0.00000000

Punto 12

Apply boosting, bagging, and random forests to a data set of your choice. Be sure to fit the models on a training set and to evaluate their performance on a test set. How accurate are the results compared to simple methods like linear or logistic regression? Which of these approaches yields the best performance?

# Base de mtcars
# Predicción de millas por galon 
#bagging
mod_mpg <- randomForest(mpg ~., data=mtcars, mtry=13, importance =TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range
mod_mpg
## 
## Call:
##  randomForest(formula = mpg ~ ., data = mtcars, mtry = 13, importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 10
## 
##           Mean of squared residuals: 5.628086
##                     % Var explained: 84.01
# RandonForest
mpg_rf <- randomForest(mpg ~., data=mtcars, mtry=13, ntree =100, importance =TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range
mpg_rf
## 
## Call:
##  randomForest(formula = mpg ~ ., data = mtcars, mtry = 13, ntree = 100,      importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 100
## No. of variables tried at each split: 10
## 
##           Mean of squared residuals: 5.604802
##                     % Var explained: 84.07
#regresion
mpg_bo <- lm(mpg ~., data=mtcars)
summary(mpg_bo)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4506 -1.6044 -0.1196  1.2193  4.6271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 12.30337   18.71788   0.657   0.5181  
## cyl         -0.11144    1.04502  -0.107   0.9161  
## disp         0.01334    0.01786   0.747   0.4635  
## hp          -0.02148    0.02177  -0.987   0.3350  
## drat         0.78711    1.63537   0.481   0.6353  
## wt          -3.71530    1.89441  -1.961   0.0633 .
## qsec         0.82104    0.73084   1.123   0.2739  
## vs           0.31776    2.10451   0.151   0.8814  
## am           2.52023    2.05665   1.225   0.2340  
## gear         0.65541    1.49326   0.439   0.6652  
## carb        -0.19942    0.82875  -0.241   0.8122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
## F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

El compratmiento en los tres modelos es muy parecido en cuanto al RMSE, diferien muy poco peude que mejore en lo modelos de bagging y rendomForest dependiendo de la poda del arbol y el número de arboles construidos es decir buscar el optimo de los parámetros con una malla.

Sesion 9.7.2

PUNTO 4

Generate a simulated two-class data set with 100 observations and two features in which there is a visible but non-linear separation between the two classes. Show that in this setting, a support vector machine with a polynomial kernel (with degree greater than 1) or a radial kernel will outperform a support vector classifier on the training data. Which technique performs best on the test data? Make plots and report training and test error rates in order to back up your assertions.

Datos simulados

  • Generacion de un conjunto de datos simulados de dos clases con 100 observaciones y dos características con separacion no lineal.
set.seed (131019)
x <- matrix(rnorm (200*2) , ncol =2)
x[1:100 ,] <- x[1:100 ,]+3
x[101:150 ,] <- x[101:150 ,] -3
y <- c(rep (1,150) ,rep (2,50))
dat <- data.frame(x=x,y=as.factor(y))
  • Representación gráfica de los datos exhibiendo cada categoria por color, note que la separacion no es lineal.
plot(x, col=y, pch =16, las=1);grid()

Máquina de soporte vectorial con núcleo polinomico

Modelo con Suport Vector Machine

train <- sample(200,100)
svmfitPol <- svm(y∼., data=dat[train,], kernel ='polynomial', cost=1, degree=2)
plot(svmfitPol , dat[train,])

  • De este gráfico podemos observar que la maquina de soporte polinomial cuadratica separa con una banda bastante bien los datos.

Tasa de clasificacion buena y mala

test <- dat[-train,]
predict_svm <- predict(svmfitPol, test)
mc_svm <- table(predict_svm, test$y)
mc_svm %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
1 2
67 2
13 18
cl_bsmv <- (mc_svm[1,1]+mc_svm[2,2])/length(test$y)
cl_msvm <- (mc_svm[1,2]+mc_svm[2,1])/length(test$y)
kable(cl_bsmv,col.names = "Buena Clasificacion")
Buena Clasificacion
0.85
kable(cl_msvm,col.names = "Mala Clasificacion")
Mala Clasificacion
0.15
  • Con la matrix de confusión anterior podemos observar que la tasa de buena clasificacion es de el 85% en la base de datos simulada.

Máquina de soporte vectorial con núcleo radial

Modelo con Suport Vector Machine

train <- sample(200,100)
svmfitRad <- svm(y∼., data=dat[train,], kernel ='radial', cost=1)
plot(svmfitRad , dat[train,])

  • Del gráfico anterior se detecta un ajuste a los datos muy similar al caso anterior, con diferencia en la cola que se extiende para aglomerar mas datos de clase.

Tasa de clasificacion buena y mala

test <- dat[-train,]
predict_svm2 <- predict(svmfitRad, test)
mc_svm2 <- table(predict_svm2, test$y)
mc_svm2 %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
1 2
77 1
0 22
Buena Clasificación
0.99
Mala Clasificación
0.01
  • La tasa de error al ajustar el modelo con un núcleo radial es mucho menor que con un núcleo polynomial, este clasifico mal apenas un 1% de los datos.

PUNTO 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform
classification using a non-linear decision boundary.We will now see that we can also obtain a non-linear decision boundary by performing logistic
regression using non-linear transformations of the features.

Literales:

A

Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:

Creamos unos datos con n = 500 y p = 2, tal que manera que las observaciones pertenezcan a dos clases con una frontera de decisión cuadrática entre ellas.

x1=runif (500) -0.5
x2=runif (500) -0.5
y=1*( x1^2-x2^2 > 0)

B

Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.

plot(x1[y == 0], x2[y == 0], col = "red", xlab = "X1", ylab = "X2", las=1,pch = 16,las=1)
points(x1[y == 1], x2[y == 1], col = "blue",pch=16)

C

Fit a logistic regression model to the data, using X1 and X2 as predictors

Ajustamos un modelo de regresión logistica con covariables x1 y x2

mod_log <- glm(y~x1+x2, family ="binomial")

D

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

Ajuste del Modelo

set.seed(131019)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- as.integer(x1 ^ 2 - x2 ^ 2 > 0)
datos <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))
lr.fit <- glm(y ~ ., data = datos, family = 'binomial')
lr.prob <- predict(lr.fit, newdata = datos, type = 'response')
lr.prob[lr.prob >.5]<-1
lr.prob[lr.prob <.5]<-0
data <- data.frame(datos,lr.prob)
data$lr.prob <- as.factor(data$lr.prob)
plot(data$x1,data$x2, col=data$lr.prob, xlab = "X1", ylab = "X2", las=1,pch=16)

Confusion Matrix

mc_glmsvm <- table(lr.prob, datos$y)
mc_glmsvm %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
0 1
0 166 165
1 91 78

Tasa de buena y mala clasificacion

clasificacion_glmsvm1 <- (mc_glmsvm[1,1]+mc_glmsvm[2,2])/(mc_glmsvm[1,1]+mc_glmsvm[1,2]+mc_glmsvm[2,1]+mc_glmsvm[2,2])
clasificacion_glmsvm <- (mc_glmsvm[1,2]+mc_glmsvm[2,1])/(mc_glmsvm[1,1]+mc_glmsvm[1,2]+mc_glmsvm[2,1]+mc_glmsvm[2,2])
kable(clasificacion_glmsvm1,col.names = "Buena Clasificación")
Buena Clasificación
0.488
kable(clasificacion_glmsvm,col.names = "Mala Clasificación")
Mala Clasificación
0.512
  • Podemos ver que para el modelo ajustado con glm se tiene una tasa de mala clasificación de alrededor 51% por lo que podríamos decir que el modelo no esta clasificando de manera correcta la clase a la que pertenecen las observaciones.

E

Modelo Ajustado {.tabset}

Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).

Ahora se ajusta un modelo de regresión logística a los datos usando funciones no lineales de X1 y X2 como predictores.

set.seed(1)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- as.integer(x1 ^ 2 - x2 ^ 2 > 0)
dat2 <- data.frame(x1 = x1 , x2 =x2, y = as.factor(y))
lr.fit2 <- glm(y ~ poly(x1, 3) + poly(x2, 3), data = dat2, family = 'binomial')

F

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

Predicciones

lr.prob2 <- predict(lr.fit2, newdata = dat2, type = 'response')
lr.prob2[lr.prob2 >.5]<-1
lr.prob2[lr.prob2 <.5]<-0
data2 <- data.frame(dat2,lr.prob2)
data2$lr.prob2 <- as.factor(data2$lr.prob2)
plot(data2$x1,data2$x2, col=data2$lr.prob2, xlab = "X1", ylab = "X2", las=1,pch=16)

Tasa de buena y mala clasificacion

mc_glmsvm2 <- table(lr.prob2, dat2$y)
mc_glmsvm2 %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
0 1
0 261 0
1 0 239
clasificacion_glmsvm23 <- (mc_glmsvm2[1,1]+mc_glmsvm2[2,2])/(mc_glmsvm2[1,1]+mc_glmsvm2[1,2]+mc_glmsvm2[2,1]+mc_glmsvm2[2,2])
clasificacion_glmsvm2 <- (mc_glmsvm2[1,2]+mc_glmsvm2[2,1])/(mc_glmsvm2[1,1]+mc_glmsvm2[1,2]+mc_glmsvm2[2,1]+mc_glmsvm2[2,2])
kable(clasificacion_glmsvm23,col.names = "Buena Clasificación")
Buena Clasificación
1
kable(clasificacion_glmsvm2,col.names = "Mala Clasificación")
Mala Clasificación
0
  • Del modelo anterior podemos ver que la clasificación fue muy buena, clasifico correctamente todas las observaciones en la clase correspondiente, pues la tasa de buena lasificacion es de el 100%.

G

Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

Modelo Ajustado

  • Ajustaremos un modelo utilizando el método de máquinas de soporte vectorial
set.seed(1)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- as.integer(x1 ^ 2 - x2 ^ 2 > 0)
data2 <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))
svm.fit <- svm(y ~ ., data = datos, kernel='linear')

Predicciones y Grafica

svm.prob <- predict(svm.fit, newdata = data2, type = 'response')
svm.prob[svm.prob >.5]<-1
svm.prob[svm.prob <.5]<-0
datta <- data.frame(data2,svm.prob)
datta$svm.prob <- as.factor(datta$svm.prob)
plot(datta$x1,datta$x2, col=datta$svm.prob,xlab="X1",ylab="X2",las=1,pch =16)

H

Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

Modelo ajustado

  • Ajustaremos un modelo utilizando el método de máquinas de soporte vectorial
set.seed(1)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- as.integer(x1 ^ 2 - x2 ^ 2 > 0)
data_2 <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))
svm_fit <- svm(y ~ ., data = datos, kernel='radial')

Predicciones y Grafica

svm_prob <- predict(svm_fit, newdata = data_2, type = 'response')
svm_prob[svm_prob >.5]<-1
svm_prob[svm_prob <.5]<-0
datta_2 <- data.frame(data_2,svm_prob)
datta_2$svm_prob <- as.factor(datta_2$svm_prob)
plot(datta_2$x1,datta_2$x2, col=datta_2$svm_prob, xlab="X1",ylab="X2",las=1,pch=16)

I

Comment on your results.

Matriz de confusión con núcleo lineal

mc_svml <- table(svm.prob, datos$y)
mc_svml %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
0 1
0 139 150
1 118 93
clasificacion_svm1 <- (mc_svml[1,1]+mc_svml[2,2])/length(datta$y)
clasificacion_svm <- (mc_svml[1,2]+mc_svml[2,1])/length(datta$y)
kable(clasificacion_svm1, col.names = "Clasificación buena")
Clasificación buena
0.464
kable(clasificacion_svm, col.names = "clasificación mala")
clasificación mala
0.536
  • Notemos que para el modelo ajustado svm con nucleo lineal se tiene una tasa de mala clasificación del 48% por lo que podríamos decir que el modelo no esta discriminando de manera correcta la clase a la que pertenece cada observación.

Matriz de confusión con núcleo radial

mc_svmr <- table(svm_prob, datos$y)
mc_svmr %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
0 1
0 137 127
1 120 116
clasificacion_svmr1 <- (mc_svmr[1,1]+mc_svmr[2,2])/length(datta_2$y)
clasificacion_svmr <- (mc_svmr[1,2]+mc_svmr[2,1])/length(datta_2$y)
kable(clasificacion_svmr1, col.names = "Clasificación buena")
Clasificación buena
0.506
kable(clasificacion_svmr, col.names = "clasificación mala")
clasificación mala
0.494
  • Notemos que el modelo con nucleo Radial es ligeramente mejor pues la tasa de buena clasificacion es de 50% aproximadamente, mientras que el modelo de nucleo lineal s de al rededor 46%, lo que es menor.

PUNTO 6

At the end of Section 9.6.1, it is claimed that in the case of data that is just barely linearly separable, a support vector classifier with a small value of cost that misclassifies a couple of training observations may perform better on test data than one with a huge value of cost that does not misclassify any training observations. You will now investigate this claim.

Literales:

A

Generate two-class data with p = 2 in such a way that the classes are just barely linearly separable.

Simulamos datos con dos clases, p = 2 de tal manera que las clases son apenas lineales separables.

set.seed(1)
obs <-  1000
x1 <- runif(obs, min = -4, max = 4)
x2 <- runif(obs, min = -1, max = 16)
y <- ifelse(x2 > x1 ^ 2, 0, 1)
daticos <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))

B

Compute the cross-validation error rates for support vector classifiers with a range of cost values. How many training errors are misclassified for each value of cost considered, and how does this relate to the cross-validation errors obtained?

Calcularemos las tasas de error de validación cruzada sobre los vectores de clasificacion de apoyo con un rango de costos.

set.seed (1)
tune.out <- tune(svm, y∼.,data=daticos ,kernel="linear",
ranges = list(cost=c(0.001 , 0.01, 0.1, 1,5,10,100)))

Obtengamos un summary del modelo

summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##  0.01
## 
## - best performance: 0.232 
## 
## - Detailed performance results:
##    cost error dispersion
## 1 1e-03 0.377 0.03335000
## 2 1e-02 0.232 0.04467164
## 3 1e-01 0.239 0.04557046
## 4 1e+00 0.238 0.04467164
## 5 5e+00 0.238 0.04467164
## 6 1e+01 0.238 0.04467164
## 7 1e+02 0.238 0.04467164
  • Analizar los errores con los valores de costo vemos que el error más bajo es cuando se tiene un costo de 0.01 ahora, la función tune() almacena el mejor modelo obtenido, como sigue:
bestmod <- tune.out$best.model
summary (bestmod)
## 
## Call:
## best.tune(method = svm, train.x = y ~ ., data = daticos, ranges = list(cost = c(0.001, 
##     0.01, 0.1, 1, 5, 10, 100)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  639
## 
##  ( 319 320 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  0 1

C

Generate an appropriate test data set, and compute the test errors corresponding to each of the values of cost considered. Which value of cost leads to the fewest test errors, and how does this compare to the values of cost that yield the fewest training errors and the fewest cross-validation errors?

  • Particionaremos la base de datos en datos de entrenamiento y datos de prueba
n1 <- 1000
n <- sample(1:nrow(daticos), n1*0.7)
da.train <- daticos[n,]
da.test <- daticos[-n,]
  • Para obtener los errores con los diferentes costos para los datos de prueba
## [1] 0.3533333 0.2200000 0.2333333 0.2333333 0.2333333 0.2333333 0.2333333
kable(paste('El cost', cost.grid[which.min(err.rate.test)], 'tiene el minimo error de prueba de:', min(err.rate.test)), col.names = "Resultado")
Resultado
El cost 0.01 tiene el minimo error de prueba de: 0.22

D

Discuss your results.

  • Luego de realizar un análisis para encontrar un costo tal que minimice el error en la clasificación se ve que es más adecuado el de costo = 0.01.

PUNTO 7

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

Literales:

A

Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

En este problema, se utilizarán enfoques de vectores de soporte para predecir si un coche determinado tiene un alto o bajo kilometraje basado la base de datos Auto

Descripción de los datos:

Un marco de datos con 392 observaciones sobre las siguientes 9 variables.

  • mpg:millas por galón.
  • Cilindros: numero de cilindros entre 4 y 8.
  • Desplazamiento: desplazamiento del motor.
  • caballos de fuerza: caballos de fuerza del motor
  • Peso: Peso del vehículo
  • aceleración:Tiempo para acelerar de 0 a 60 mph
  • año:Año del modelo
  • origen:Origen del coche (1. americano, 2. europeo, 3. japones)
  • nombre:nombre del vehiculo

Los datos tienen 408 observaciones y se eliminaron 16 observaciones con NA’s.

Crearemos una variable binaria que tome el valor de uno si el valor de mpg es mayor a su mediana y cero si es menor.

mediana <- median(Auto$mpg)
kable(mediana, col.names="Mediana")
Mediana
22.75
median_mpg <- y <- ifelse(Auto$mpg > mediana, 1, 0)
bd_auto <- data.frame(Auto,median_mpg)
bd_auto <- bd_auto[,-1]

B

Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

Calculemos las tasas de error de validación cruzada de los clasificadores de vectores de soporte con el rango de valores para costo.

tune.auto <- tune(svm, median_mpg ~ ., data = bd_auto, kernel = 'linear', ranges = list(cost=c(0.001 , 0.01, 0.1, 1,5,10,100)))

Ahora veamos el resumen del modelo

summary(tune.auto)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.0936813 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-03 0.10857094 0.02384915
## 2 1e-02 0.10413206 0.03237924
## 3 1e-01 0.10170970 0.03407414
## 4 1e+00 0.09368130 0.02210164
## 5 5e+00 0.09789524 0.02593135
## 6 1e+01 0.10338815 0.02768520
## 7 1e+02 0.12046961 0.03412864
  • Del summary del modelo podemos ver que el error más pequeño es cuando el cost es igual a 1.

C

Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

Se repetira el literal b, esta vez usando SVM con núcleos de base radial y polinómica, con diferentes valores de gamma y grado y costo.

Modelo con kernel radial

tune.auto2 <- tune(svm, median_mpg ~ ., data = bd_auto, kernel = "radial", ranges = list(cost=c(0.001 , 0.01, 0.1, 1,5,10,100),gamma=c(0.5,1,2,3,4)))
summary(tune.auto2)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##     1   0.5
## 
## - best performance: 0.06403344 
## 
## - Detailed performance results:
##     cost gamma      error  dispersion
## 1  1e-03   0.5 0.47919628 0.090617665
## 2  1e-02   0.5 0.39225952 0.073074319
## 3  1e-01   0.5 0.08479546 0.018008475
## 4  1e+00   0.5 0.06403344 0.017487465
## 5  5e+00   0.5 0.06947274 0.023052725
## 6  1e+01   0.5 0.07196826 0.025563570
## 7  1e+02   0.5 0.07736469 0.031480222
## 8  1e-03   1.0 0.48701642 0.092222527
## 9  1e-02   1.0 0.46450268 0.087598554
## 10 1e-01   1.0 0.27853302 0.051749479
## 11 1e+00   1.0 0.09805619 0.017002176
## 12 5e+00   1.0 0.10315891 0.021590590
## 13 1e+01   1.0 0.10444029 0.023985435
## 14 1e+02   1.0 0.10444356 0.023981735
## 15 1e-03   2.0 0.48883144 0.092575461
## 16 1e-02   2.0 0.48210414 0.090932581
## 17 1e-01   2.0 0.41946497 0.076343025
## 18 1e+00   2.0 0.20360650 0.009336804
## 19 5e+00   2.0 0.20469786 0.011348028
## 20 1e+01   2.0 0.20469786 0.011348028
## 21 1e+02   2.0 0.20469786 0.011348028
## 22 1e-03   3.0 0.48900970 0.092611431
## 23 1e-02   3.0 0.48362255 0.091190782
## 24 1e-01   3.0 0.43294417 0.078198238
## 25 1e+00   3.0 0.23066226 0.007152203
## 26 5e+00   3.0 0.23087457 0.007553667
## 27 1e+01   3.0 0.23087457 0.007553667
## 28 1e+02   3.0 0.23087457 0.007553667
## 29 1e-03   4.0 0.48907195 0.092626479
## 30 1e-02   4.0 0.48405713 0.091236087
## 31 1e-01   4.0 0.43653305 0.078345767
## 32 1e+00   4.0 0.23550898 0.006188529
## 33 5e+00   4.0 0.23557570 0.006270171
## 34 1e+01   4.0 0.23557570 0.006270171
## 35 1e+02   4.0 0.23557570 0.006270171
bestmod <- tune.auto2$best.model
bestmod
## 
## Call:
## best.tune(method = svm, train.x = median_mpg ~ ., data = bd_auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100), gamma = c(0.5, 
##         1, 2, 3, 4)), kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.5 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  281
  • Del resumen podemos ver que el mejor modelo tiene un gamma de 0.5 y un cost igual a 1 para un kernel radial.

Modelo con kernel polynomial

tune.auto3 <- tune(svm, median_mpg ~ ., data = bd_auto, kernel = "polynomial", ranges = list(cost=c(0.001 , 0.01, 0.1, 1,5,10,100),gamma=c(0.5,1,2,3,4), degree=c(2,3,4)))
summary(tune.auto3)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##   cost gamma degree
##  0.001     2      3
## 
## - best performance: 0.1118502 
## 
## - Detailed performance results:
##      cost gamma degree     error dispersion
## 1   1e-03   0.5      2 0.3037207 0.03913115
## 2   1e-02   0.5      2 0.2066717 0.02459985
## 3   1e-01   0.5      2 0.1346013 0.04045215
## 4   1e+00   0.5      2 0.1334015 0.04167428
## 5   5e+00   0.5      2 0.1576084 0.03699263
## 6   1e+01   0.5      2 0.1803726 0.03344241
## 7   1e+02   0.5      2 0.2529667 0.04313562
## 8   1e-03   1.0      2 0.2560198 0.03328955
## 9   1e-02   1.0      2 0.1588127 0.02100216
## 10  1e-01   1.0      2 0.1271540 0.04298727
## 11  1e+00   1.0      2 0.1519365 0.03696317
## 12  5e+00   1.0      2 0.2042271 0.03107685
## 13  1e+01   1.0      2 0.2198195 0.03336798
## 14  1e+02   1.0      2 0.2731809 0.05179575
## 15  1e-03   2.0      2 0.1894369 0.02107246
## 16  1e-02   2.0      2 0.1304596 0.04250981
## 17  1e-01   2.0      2 0.1370726 0.04035350
## 18  1e+00   2.0      2 0.1971576 0.02940547
## 19  5e+00   2.0      2 0.2403235 0.03783207
## 20  1e+01   2.0      2 0.2731809 0.05179575
## 21  1e+02   2.0      2 0.2731809 0.05179575
## 22  1e-03   3.0      2 0.1630548 0.02031295
## 23  1e-02   3.0      2 0.1270127 0.04297037
## 24  1e-01   3.0      2 0.1496705 0.03706794
## 25  1e+00   3.0      2 0.2182405 0.03330025
## 26  5e+00   3.0      2 0.2731812 0.05180174
## 27  1e+01   3.0      2 0.2731812 0.05180174
## 28  1e+02   3.0      2 0.2731812 0.05180174
## 29  1e-03   4.0      2 0.1420894 0.02886099
## 30  1e-02   4.0      2 0.1286486 0.04185520
## 31  1e-01   4.0      2 0.1651422 0.03773183
## 32  1e+00   4.0      2 0.2311673 0.03506680
## 33  5e+00   4.0      2 0.2731809 0.05179575
## 34  1e+01   4.0      2 0.2731809 0.05179575
## 35  1e+02   4.0      2 0.2731809 0.05179575
## 36  1e-03   0.5      3 0.1414858 0.02930834
## 37  1e-02   0.5      3 0.1170883 0.04161977
## 38  1e-01   0.5      3 0.1172960 0.06223539
## 39  1e+00   0.5      3 0.1547983 0.07114794
## 40  5e+00   0.5      3 0.1824597 0.06594286
## 41  1e+01   0.5      3 0.1920270 0.07872333
## 42  1e+02   0.5      3 0.2151585 0.07038655
## 43  1e-03   1.0      3 0.1199604 0.04083700
## 44  1e-02   1.0      3 0.1141842 0.05818515
## 45  1e-01   1.0      3 0.1504562 0.07225655
## 46  1e+00   1.0      3 0.1904213 0.07620357
## 47  5e+00   1.0      3 0.2151585 0.07038655
## 48  1e+01   1.0      3 0.2151585 0.07038655
## 49  1e+02   1.0      3 0.2151585 0.07038655
## 50  1e-03   2.0      3 0.1118502 0.05466670
## 51  1e-02   2.0      3 0.1444370 0.07164791
## 52  1e-01   2.0      3 0.1859255 0.06970397
## 53  1e+00   2.0      3 0.2151585 0.07038655
## 54  5e+00   2.0      3 0.2151585 0.07038655
## 55  1e+01   2.0      3 0.2151585 0.07038655
## 56  1e+02   2.0      3 0.2151585 0.07038655
## 57  1e-03   3.0      3 0.1268594 0.06999682
## 58  1e-02   3.0      3 0.1656884 0.06722630
## 59  1e-01   3.0      3 0.2011184 0.07454636
## 60  1e+00   3.0      3 0.2151609 0.07038467
## 61  5e+00   3.0      3 0.2151609 0.07038467
## 62  1e+01   3.0      3 0.2151609 0.07038467
## 63  1e+02   3.0      3 0.2151609 0.07038467
## 64  1e-03   4.0      3 0.1380603 0.07077995
## 65  1e-02   4.0      3 0.1827353 0.06619124
## 66  1e-01   4.0      3 0.2151585 0.07038655
## 67  1e+00   4.0      3 0.2151585 0.07038655
## 68  5e+00   4.0      3 0.2151585 0.07038655
## 69  1e+01   4.0      3 0.2151585 0.07038655
## 70  1e+02   4.0      3 0.2151585 0.07038655
## 71  1e-03   0.5      4 0.1723381 0.02687206
## 72  1e-02   0.5      4 0.1745745 0.09661690
## 73  1e-01   0.5      4 0.2322382 0.14263048
## 74  1e+00   0.5      4 0.3238749 0.11425828
## 75  5e+00   0.5      4 0.3743967 0.13332628
## 76  1e+01   0.5      4 0.3841191 0.13537137
## 77  1e+02   0.5      4 0.3948103 0.13856391
## 78  1e-03   1.0      4 0.1984546 0.13803729
## 79  1e-02   1.0      4 0.2426577 0.12739937
## 80  1e-01   1.0      4 0.3402189 0.11322789
## 81  1e+00   1.0      4 0.3948103 0.13856391
## 82  5e+00   1.0      4 0.3948103 0.13856391
## 83  1e+01   1.0      4 0.3948103 0.13856391
## 84  1e+02   1.0      4 0.3948103 0.13856391
## 85  1e-03   2.0      4 0.2643480 0.11860975
## 86  1e-02   2.0      4 0.3503557 0.10962565
## 87  1e-01   2.0      4 0.3948103 0.13856391
## 88  1e+00   2.0      4 0.3948103 0.13856391
## 89  5e+00   2.0      4 0.3948103 0.13856391
## 90  1e+01   2.0      4 0.3948103 0.13856391
## 91  1e+02   2.0      4 0.3948103 0.13856391
## 92  1e-03   3.0      4 0.3350223 0.11508852
## 93  1e-02   3.0      4 0.3937248 0.13789661
## 94  1e-01   3.0      4 0.3948081 0.13856314
## 95  1e+00   3.0      4 0.3948081 0.13856314
## 96  5e+00   3.0      4 0.3948081 0.13856314
## 97  1e+01   3.0      4 0.3948081 0.13856314
## 98  1e+02   3.0      4 0.3948081 0.13856314
## 99  1e-03   4.0      4 0.3659839 0.12252747
## 100 1e-02   4.0      4 0.3948103 0.13856391
## 101 1e-01   4.0      4 0.3948103 0.13856391
## 102 1e+00   4.0      4 0.3948103 0.13856391
## 103 5e+00   4.0      4 0.3948103 0.13856391
## 104 1e+01   4.0      4 0.3948103 0.13856391
## 105 1e+02   4.0      4 0.3948103 0.13856391
bestmod <- tune.auto3$best.model
bestmod
## 
## Call:
## best.tune(method = svm, train.x = median_mpg ~ ., data = bd_auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100), gamma = c(0.5, 
##         1, 2, 3, 4), degree = c(2, 3, 4)), kernel = "polynomial")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  polynomial 
##        cost:  0.001 
##      degree:  3 
##       gamma:  2 
##      coef.0:  0 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  339
  • Del resumen obtenido para el modelo ajustado con un kernel polynomial se observa que el mejor ajusta es el que tiene un cost=0.001, degree=3 y gamma=2

D

Make some plots to back up your assertions in (b) and (c). Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing

Con el fin de justificar los resultados obtenidos en los literales anteriores, se procedió a realizar los gráficos correspondientes a los modelos ajustados sobre diferentes núcleos y parámetros óptimos, en dichos gráficos el eje Y corresponde a la variable mpg y en el eje x cada una de las covariables, para así observar la relación entre cada covariable y la variable respuesta

bd_auto <- data.frame(Auto,median_mpg)
bd_auto$median_mpg <- as.factor(bd_auto$median_mpg)

svm.linear <- svm(median_mpg ~ ., data = bd_auto, kernel = "linear", cost = 0.1)
svm.poly <- svm(median_mpg ~ ., data = bd_auto, kernel = "polynomial", cost = 0.001, degree = 3, gamma=2)
svm.radial <- svm(median_mpg ~ ., data = bd_auto, kernel = "radial", cost = 1, gamma = 0.5)
plotpairs = function(fit) {
  for (name in names(bd_auto)[!(names(bd_auto) %in% c("mpg", "median_mpg", "name"))]) {
    plot(svm.linear, bd_auto, as.formula(paste("mpg~", name, sep = "")))
  }
}
plotpairs(svm.linear)

plotpairs(svm.poly)

plotpairs(svm.radial)

PUNTO 8

This problem involves the OJ data set which is part of the ISLR package.

Literales:

A

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

Creamos un conjunto de entrenamiento que contiene una muestra aleatoria de 800 observaciones y un conjunto de prueba que contiene las observaciones restantes.

set.seed(1)
train <- sample(nrow(OJ), 800)
OJ.train <- OJ[train, ]
OJ.test <- OJ[-train, ]

B

Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

Modelo ajustado de SVM a los datos de entrenamiento usando “costo”=0.01

svm.oj <- svm(Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
summary(svm.oj)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  435
## 
##  ( 219 216 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

C

What are the training and test error rates?

Uzaremos las matrices de confuacion para responder esta pregunta, como sigue:

Matriz de confusión para Train

train.pred <- predict(svm.oj, OJ.train)
svm_oj <- table(OJ.train$Purchase, train.pred)
svm_oj %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 420 65
MM 75 240

Tasas de clasificación

clasificacion_svmbuena <- (svm_oj[1,1]+svm_oj[2,2])/length(OJ.train$Purchase)
clasificacion_svmala <- (svm_oj[1,2]+svm_oj[2,1])/length(OJ.train$Purchase)
kable(clasificacion_svmbuena, col.names="Clasificacion buena")
Clasificacion buena
0.825
kable(clasificacion_svmala, col.names="Clasificación mala")
Clasificación mala
0.175

Matriz de confusión para Test

test.pred <- predict(svm.oj, OJ.test)
svm_ojt <- table(OJ.test$Purchase, test.pred)
svm_oj %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 420 65
MM 75 240

Tasas de clasificación

clasificacion_svmbuena <- (svm_ojt[1,1]+svm_ojt[2,2])/length(OJ.test$Purchase)
clasificacion_svmala <- (svm_oj[1,2]+svm_ojt[2,1])/length(OJ.test$Purchase)
kable(clasificacion_svmbuena, col.names="Clasificacion buena")
Clasificacion buena
0.8222222
kable(clasificacion_svmala, col.names="Clasificación mala")
Clasificación mala
0.362963
  • La tasa de error para el modelo ajustado con los datos de entrenamiento es de 17% y la tasa de error para los datos de prueba es de 36%.

D

Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

Selección de un “costo” óptimo con tune.

set.seed(2)
tune.out <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "linear", ranges = list(cost = seq(0.01, 10, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##  4.01
## 
## - best performance: 0.165 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.17625 0.04059026
## 2  0.26 0.16750 0.04257347
## 3  0.51 0.17125 0.04126894
## 4  0.76 0.16875 0.04259385
## 5  1.01 0.17000 0.04090979
## 6  1.26 0.16750 0.03782269
## 7  1.51 0.16750 0.03782269
## 8  1.76 0.16750 0.03782269
## 9  2.01 0.16750 0.03782269
## 10 2.26 0.16750 0.03782269
## 11 2.51 0.16750 0.03827895
## 12 2.76 0.16875 0.03738408
## 13 3.01 0.16875 0.03738408
## 14 3.26 0.16750 0.03782269
## 15 3.51 0.16625 0.03634805
## 16 3.76 0.16625 0.03634805
## 17 4.01 0.16500 0.03809710
## 18 4.26 0.16500 0.03809710
## 19 4.51 0.16500 0.03809710
## 20 4.76 0.16625 0.03634805
## 21 5.01 0.16750 0.03545341
## 22 5.26 0.16750 0.03545341
## 23 5.51 0.16750 0.03545341
## 24 5.76 0.16750 0.03545341
## 25 6.01 0.16750 0.03545341
## 26 6.26 0.16750 0.03545341
## 27 6.51 0.16750 0.03545341
## 28 6.76 0.16625 0.03729108
## 29 7.01 0.16750 0.03545341
## 30 7.26 0.16750 0.03545341
## 31 7.51 0.16750 0.03545341
## 32 7.76 0.16750 0.03545341
## 33 8.01 0.16750 0.03545341
## 34 8.26 0.16750 0.03545341
## 35 8.51 0.17000 0.03736085
## 36 8.76 0.17000 0.03736085
## 37 9.01 0.17000 0.03736085
## 38 9.26 0.17000 0.03736085
## 39 9.51 0.17000 0.03736085
## 40 9.76 0.17000 0.03736085

Se puede observar que el error es menor cuando se utiliza un cost igual a 4.01, teniendo conocimiento de esto, obtengamos el mejor modelo

bestmod <- tune.out$best.model
bestmod
## 
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = OJ.train, 
##     ranges = list(cost = seq(0.01, 10, by = 0.25)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  4.01 
## 
## Number of Support Vectors:  329

E

Compute the training and test error rates using this new value for cost.

Modelo Ajustado

Ajuste modelo svm con el costo optimo

svm.li <- svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = tune.out$best.parameter$cost)

Matriz de confusión en Train

train.pred <- predict(svm.li, OJ.train)
svm1 <- table(OJ.train$Purchase, train.pred)
svm1 %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 423 62
MM 70 245

Tasas de clasificación

clasificacion_svmbue <- (svm1[1,1]+svm1[2,2])/length(OJ.train$Purchase)
clasificacion_svmal <- (svm1[1,2]+svm1[2,1])/length(OJ.train$Purchase)
kable(clasificacion_svmbue, col.names="Clasificacion buena")
Clasificacion buena
0.835
kable(clasificacion_svmal, col.names="Clasificación mala")
Clasificación mala
0.165

Matriz de confusión en Test

test.pred <- predict(svm.li, OJ.test)
svm_test <-table(OJ.test$Purchase, test.pred)
svm_test %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 155 13
MM 29 73

Tasas de clasificación

clasificacion_svmbu <- (svm_test[1,1]+svm_test[2,2])/length(OJ.test$Purchase)
clasificacion_svma <- (svm_test[1,2]+svm_test[2,1])/length(OJ.test$Purchase)
kable(clasificacion_svmbu, col.names="Clasificacion buena")
Clasificacion buena
0.8444444
kable(clasificacion_svma, col.names="Clasificación mala")
Clasificación mala
0.1555556
  • De acuerdo con las matrices de confusión obtenidas anteriormente, la tasa de error para el modelo ajustado para entrenamiento es de alrededor de 16% mientras que la tasa de error para prueba es de 15%.

F

Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

Modelo Ajustado

Ajuste modelo svm con nucleo radial y parámetro gamma predeterminado.

svm.radial <- svm(Purchase ~ ., kernel = "radial", data = OJ.train)
summary(svm.radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  373
## 
##  ( 188 185 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Matriz de confusión datos train y test

train.predr <- predict(svm.radial, OJ.train)
svm_rad <- table(OJ.train$Purchase, train.predr)
svm_rad %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 441 44
MM 77 238

Tasas de clasificación

clasificacion_svmbue <- (svm_rad[1,1]+svm_rad[2,2])/length(OJ.train$Purchase)
clasificacion_svmal <- (svm_rad[1,2]+svm_rad[2,1])/length(OJ.train$Purchase)
kable(clasificacion_svmbue, col.names="Clasificacion buena")
Clasificacion buena
0.84875
kable(clasificacion_svmal, col.names="Clasificación mala")
Clasificación mala
0.15125

Matriz de confusión en Train

test.predr <- predict(svm.radial, OJ.test)
svm_radt <-table(OJ.test$Purchase, test.predr)
svm_rad %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 441 44
MM 77 238

Tasas de clasificación

clasificacion_svmbue <- (svm_radt[1,1]+svm_radt[2,2])/length(OJ.test$Purchase)
clasificacion_svmal <- (svm_radt[1,2]+svm_radt[2,1])/length(OJ.test$Purchase)
kable(clasificacion_svmbue, col.names="Clasificacion buena")
Clasificacion buena
0.8148148
kable(clasificacion_svmal, col.names="Clasificación mala")
Clasificación mala
0.1851852
  • El núcleo radial con gamma predeterminado crea 373 vectores de soporte, de los que 188 pertenecen al nivel CH y los demas restantes pertenecen al nivel MM. Tiene un error de entrenamiento del 15% y un error de prueba del 18.5%, son tasas muy similares con respecto al núcleo lineal. Ahora usamos validación cruzada para encontrar el costo óptimo.

Ajuste modelo con costo óptimo

set.seed(2)
tune.outra <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "radial", ranges = list(cost = seq(0.01, 10, by = 0.25)))
summary(tune.outra)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##  1.26
## 
## - best performance: 0.17 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39375 0.03240906
## 2  0.26 0.17375 0.03972562
## 3  0.51 0.17500 0.03908680
## 4  0.76 0.17250 0.03855011
## 5  1.01 0.17125 0.03283481
## 6  1.26 0.17000 0.03446012
## 7  1.51 0.17500 0.03632416
## 8  1.76 0.17750 0.03717451
## 9  2.01 0.17750 0.03525699
## 10 2.26 0.17875 0.03283481
## 11 2.51 0.17750 0.03622844
## 12 2.76 0.17875 0.03387579
## 13 3.01 0.18250 0.03343734
## 14 3.26 0.18375 0.03438447
## 15 3.51 0.18375 0.03438447
## 16 3.76 0.18375 0.03682259
## 17 4.01 0.18375 0.03682259
## 18 4.26 0.18500 0.03899786
## 19 4.51 0.18375 0.03682259
## 20 4.76 0.18375 0.03682259
## 21 5.01 0.18375 0.03682259
## 22 5.26 0.18375 0.03682259
## 23 5.51 0.18500 0.03717451
## 24 5.76 0.18500 0.03717451
## 25 6.01 0.18500 0.03717451
## 26 6.26 0.18500 0.03717451
## 27 6.51 0.18500 0.03717451
## 28 6.76 0.18625 0.03747684
## 29 7.01 0.18750 0.03535534
## 30 7.26 0.18750 0.03535534
## 31 7.51 0.18750 0.03535534
## 32 7.76 0.18750 0.03535534
## 33 8.01 0.18750 0.03535534
## 34 8.26 0.18625 0.03356689
## 35 8.51 0.18750 0.03118048
## 36 8.76 0.18625 0.03087272
## 37 9.01 0.18500 0.03216710
## 38 9.26 0.18500 0.03050501
## 39 9.51 0.18625 0.03030516
## 40 9.76 0.18750 0.03173239

Ajuste modelo svm con costo óptimo

svm.radialc <- svm(Purchase ~ ., kernel = "radial", data = OJ.train, cost = tune.outra$best.parameter$cost)
summary(svm.radialc)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial", cost = tune.outra$best.parameter$cost)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1.26 
## 
## Number of Support Vectors:  367
## 
##  ( 186 181 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Matriz de confusión modelo radial train y test

train.pred <- predict(svm.radialc, OJ.train)
mta<- table(OJ.train$Purchase, train.pred)
mta %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 441 44
MM 74 241

Tasas de clasificación

clasificacion_svmbuenc <- (mta[1,1]+mta[2,2])/length(OJ.train$Purchase)
clasificacion_svmalac <- (mta[1,2]+mta[2,1])/length(OJ.train$Purchase)
kable(clasificacion_svmbuenc, col.names="Clasificacion buena")
Clasificacion buena
0.8525
kable(clasificacion_svmalac, col.names="Clasificación mala")
Clasificación mala
0.1475

Matriz de confusión datos de prueba modelo radial

test.pred <- predict(svm.radialc, OJ.test)
mop <- table(OJ.test$Purchase, test.pred)
mop %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 152 16
MM 32 70

Tasa de clasificación

clasificacion_svmbueop <- (mop[1,1]+mop[2,2])/length(OJ.test$Purchase)
clasificacion_svmalop <- (mop[1,2]+mop[2,1])/length(OJ.test$Purchase)
kable(clasificacion_svmbueop, col.names="Clasificacion buena")
Clasificacion buena
0.8222222
kable(clasificacion_svmalop, col.names="Clasificación mala")
Clasificación mala
0.1777778
  • El clasificador tiene un error de entrenamiento del 14% y un error de prueba del 17%, es una mejora respecto al modelo ajustado con un costo predeterminado, por lo tanto el costo óptimo calculado anteriormente logra minimizar el error tanto train como en test.

G

Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.

Modelo Ajustado

Modelo ajustado svm con nucleo polinomial grado=2.

svm.poly <- svm(Purchase ~ ., kernel = "polynomial", data = OJ.train, degree=2)
summary(svm.poly)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial", 
##     degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  447
## 
##  ( 225 222 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Matriz de confusión datos train y test

train.predrpo <- predict(svm.poly, OJ.train)
svm_poly <- table(OJ.train$Purchase, train.predrpo)
svm_poly %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 449 36
MM 110 205

Tasas de clasificación

clasificacion_svmbuepo <- (svm_poly[1,1]+svm_poly[2,2])/length(OJ.train$Purchase)
clasificacion_svmalpo <- (svm_poly[1,2]+svm_poly[2,1])/length(OJ.train$Purchase)
kable(clasificacion_svmbuepo, col.names="Clasificacion buena")
Clasificacion buena
0.8175
kable(clasificacion_svmalpo, col.names="Clasificación mala")
Clasificación mala
0.1825

Matriz de confusión datos de prueba

test.predpol <- predict(svm.poly, OJ.test)
svm_pol <-table(OJ.test$Purchase, test.predpol)
svm_pol %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 153 15
MM 45 57

Tasas de clasificación

clasificacion_svmbuep <- (svm_pol[1,1]+svm_pol[2,2])/length(OJ.test$Purchase)
clasificacion_svmalp <- (svm_pol[1,2]+svm_pol[2,1])/length(OJ.test$Purchase)
kable(clasificacion_svmbuep, col.names="Clasificacion buena")
Clasificacion buena
0.7777778
kable(clasificacion_svmalp, col.names="Clasificación mala")
Clasificación mala
0.2222222
  • El núcleo polinómico con degree=2 crea 447 vectores de soporte, de los cuales, 225 pertenecen al nivel CH y los 222 restantes pertenecen al nivel MM. El clasificador tiene un error de entrenamiento del 18% y un error de prueba del 22%, lo que es una ligera mejora con repecto a el núcleo lineal. Ahora usamos validación cruzada para encontrar el costo óptimo.

Ajuste modelo svm con costo óptimo

set.seed(2)
tune.outrapo <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "polynomial",degree=2, ranges = list(cost = seq(0.01, 10, by = 0.25)))
summary(tune.outrapo)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##  2.76
## 
## - best performance: 0.17625 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39000 0.03670453
## 2  0.26 0.20625 0.04050463
## 3  0.51 0.20250 0.03670453
## 4  0.76 0.20000 0.04124790
## 5  1.01 0.19750 0.04116363
## 6  1.26 0.19875 0.04143687
## 7  1.51 0.19250 0.03736085
## 8  1.76 0.19250 0.03545341
## 9  2.01 0.18500 0.03809710
## 10 2.26 0.18250 0.03872983
## 11 2.51 0.17875 0.04566256
## 12 2.76 0.17625 0.04226652
## 13 3.01 0.17625 0.04505013
## 14 3.26 0.18000 0.04005205
## 15 3.51 0.17875 0.03955042
## 16 3.76 0.17875 0.03998698
## 17 4.01 0.18000 0.03961621
## 18 4.26 0.18250 0.04005205
## 19 4.51 0.18125 0.04007372
## 20 4.76 0.17875 0.03634805
## 21 5.01 0.18000 0.03872983
## 22 5.26 0.18125 0.04050463
## 23 5.51 0.18000 0.04133199
## 24 5.76 0.18000 0.04133199
## 25 6.01 0.18000 0.03872983
## 26 6.26 0.17750 0.03717451
## 27 6.51 0.17875 0.03682259
## 28 6.76 0.17875 0.03682259
## 29 7.01 0.17875 0.03682259
## 30 7.26 0.17875 0.03682259
## 31 7.51 0.18000 0.03496029
## 32 7.76 0.18000 0.03343734
## 33 8.01 0.18000 0.03343734
## 34 8.26 0.17875 0.03537988
## 35 8.51 0.17875 0.03537988
## 36 8.76 0.18000 0.03827895
## 37 9.01 0.18125 0.03875224
## 38 9.26 0.18000 0.03827895
## 39 9.51 0.18000 0.03827895
## 40 9.76 0.18125 0.03830162

Ajuste modelo svm con costo óptimo

svm.polyc <- svm(Purchase ~ ., kernel = "polynomial",degree=2, data = OJ.train, cost = tune.outrapo$best.parameter$cost)
summary(svm.polyc)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial", 
##     degree = 2, cost = tune.outrapo$best.parameter$cost)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  2.76 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  387
## 
##  ( 197 190 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Matriz de confusión modelo polinomial train y test

train.predpo <- predict(svm.polyc, OJ.train)
poli<- table(OJ.train$Purchase, train.predpo)
poli %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 452 33
MM 92 223

Tasa de clasificación buena y mala

clasificacion_svmbuenp <- (poli[1,1]+poli[2,2])/length(OJ.train$Purchase)
clasificacion_svmalap <- (poli[1,2]+poli[2,1])/length(OJ.train$Purchase)
kable(clasificacion_svmbuenp, col.names="Clasificacion buena")
Clasificacion buena
0.84375
kable(clasificacion_svmalap, col.names="Clasificación mala")
Clasificación mala
0.15625

Matriz de confusión para datos de entrenamiento modelo polinomial

test.predpo <- predict(svm.polyc, OJ.test)
polyc <- table(OJ.test$Purchase, test.predpo)
polyc %>% kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
CH MM
CH 153 15
MM 40 62

Tasa de clasificación buena y mala

clasificacion_svmbueopt <- (polyc[1,1]+polyc[2,2])/length(OJ.test$Purchase)
clasificacion_svmalopt <- (polyc[1,2]+polyc[2,1])/length(OJ.test$Purchase)
kable(clasificacion_svmbueopt, col.names="Clasificacion buena")
Clasificacion buena
0.7962963
kable(clasificacion_svmalopt, col.names="Clasificación mala")
Clasificación mala
0.2037037
  • El clasificador tiene un error de entrenamiento del 15% y un error de prueba del 20%,lo que es una mejora con respecto al modelo ajustado con un costo predeterminado, por lo tanto el costo óptimo calculado anteriormente logra minimizar el error tanto de entrenamiento como de prueba.

H

Overall, which approach seems to give the best results on this data?

  • El modelo ajustado con núcleo radial parece estar produciendo un error mínimo de clasificación errónea en los datos de entrenamiento y de prueba, tanto para el ajuste con un costo predeterminado o el costo óptimo. Lo que nos permite concluir que de los modelos analizados anteriormente el mejor es el modelo con núcleo radial, costo=1.26 y gamma predeterminada.

Ensayo

ES EL APRENDIZAJE ESTADÍSTICO PIEZA FUNDAMENTAL EN EL ENTENDIMIENTO DE LA CALIDAD DEL AIRE EN MEDELLÍN

La salud cardiovascular y respiratoria de la población, tanto a largo como a corto plazo depende en gran parte de la calidad del aire. Cada día la contaminación del aire en el mundo va en aumento, es tan así que actualmente es uno de los problemas ambientales más severos a nivel mundial, ya que está presente en todas las sociedades, independientemente del nivel de desarrollo socioeconómico, y constituye un fenómeno que tiene fuerte incidencia sobre la salud de los humanos. El crecimiento económico y la urbanización, asociados al desarrollo de diversas actividades como la industria petrolera, los servicios, la agroindustria y el incremento de las unidades automotoras, traen como resultado elevados volúmenes de contaminantes.

El área metropolitana del Valle de Aburrá, en particular la ciudad de Medellín no se queda atrás con esta problemática. Desde ya varios años vienen implementando estrategias para reducir el problema centrado principalmente en la cantidad de automóviles en la ciudad, para esto han articulado la Estrategia Nacional de Calidad del aire que prioriza acciones enfocadas en la reducción de emisiones contaminantes generadas por los vehículos automotores y las actividades productivas de servicio haciéndolo parte de uno de los objetivos del desarrollo sostenible (ODS) dado que es un problema ambiental que aqueja a la población para al menos los próximos 15 años.

La Alcaldía de Medellín en el portal de datos abiertos cuenta con información de contaminantes obtenida por la red de monitoreo de calidad del aire, discriminada por cada contaminante, además con un conjunto de datos llamado Encuesta de Calidad de Vida – Medellín Como Vamos, la cual consta de 342 preguntas cubriendo 15 dimensiones sociales en la que se encuentra la calidad del medio ambiente y su percepción según los habitantes de cada barrio de la ciudad. Con estos datos y el uso aprendizaje estadístico se puede entender mejor algunos fenómenos de la calidad del aire dentro de la ciudad, por ejemplo con Análisis de Componentes Principales se puede crear un indicador intraurbano que cuantifique la gravedad del aire según la percepción de los ciudadanos conjugada con la información del monitoreo, también estimar acertadamente el número de vehículos por familia en cada barrio y así plantear otros tipos de estrategias a parte del pico y placa ambiental que ayuden con la reducción de movilidad particular.

Sin embargo, el horizonte es poco alentador en unos años sobre la calidad del aire dado el aumento del parque automotor y crecimiento comercial, por eso es importante también diseñar otros tipos de encuesta y recolección de datos con el fin de llegar a factores de riesgo más certeros con los cuales se pueda atacar el problema de raíz y dar una solución más grande y duradera a este problema de Salud Pública, además de crear otras encuestas de percepción sobre el uso del transporte público y la bicicleta para implementar estrategias que consoliden estos medios de transporte por encima del particular.

En conclusión ,la principal consecuencia de la calidad del aire es la afectación a la salud, una de las principales causas es la cantidad de vehiculos con su constante crecimiento y una solución efectiva es la identificación de relaciones causantes de la mala calidad del aire a partir de información recolectada día a día

Referencias